1 <matsubara> #startmeeting
2 <MootBot> Meeting started at 10:00. The chair is matsubara.
3 <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
4 <Ursinha> roll call,roll call
5 <matsubara> my firefox died
6 <sinzui> me
7 * stub belches
8 <matsubara> hang on a second please
9 <henninge> me
10 <Ursinha> poor matsubara
11 * jml eavesdrops
12 * bigjools wafts stub's belch away
13 <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
14 <rockstar> ni
15 <matsubara> [TOPIC] Roll Call
16 <MootBot> New Topic: Roll Call
17 <Ursinha> meeee
18 <matsubara> Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us!
19 <bigjools> me
20 <stub> me
21 <henninge> me again
22 <intellectronica> i
23 <matsubara> flacoste, hi
24 <flacoste> me
25 <matsubara> herb, hi
26 <herb> me
27 <matsubara> ok, everyone here.
28 <matsubara> [TOPIC] Agenda
29 <MootBot> New Topic: Agenda
30 <matsubara> * Actions from last meeting
31 <matsubara> * Oops report & Critical Bugs & Broken scripts
32 <matsubara> * Operations report (mthaddon/herb/spm)
33 <matsubara> * DBA report (stub)
34 <matsubara> [TOPIC] * Actions from last meeting
35 <MootBot> New Topic: * Actions from last meeting
36 <matsubara> * matsubara to chase rockstar about failure on updatebranches script
37 <matsubara> * stub to give a try on bug 354593 with mars help if needed
38 <matsubara> * stub to fix bug 310818
39 <matsubara> * mars to take a look at OOPS-1307J16
40 <matsubara> * Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606
41 <matsubara> * mars and stub to discuss the Disconnection and OperationalErrors after the meeting
42 <jml> me
43 <ubottu> Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593
44 <ubottu> Launchpad bug 310818 in launchpad-foundations "Oops report does not always log timed-out query" [High,In progress] https://launchpad.net/bugs/310818
45 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
46 <Ursinha> yay, jml
47 <matsubara> I suck, I didn't chase rockstar about the updatebranches script failures
48 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
49 <rockstar> matsubara, I thought we agreed that mwhudson would be better to chase on it.
50 <jml> matsubara, got a URL for the failure?
51 <matsubara> otoh, the script is not failing anymore...
52 <rockstar> matsubara, I know mwhudson was looking at it on his Tuesday.
53 <rockstar> jml would be good to ask as well.
54 * cprov is now known as cprov-lunch
55 <matsubara> rockstar, all right. I'll talk to jml and mwhudson later on today
56 <matsubara> [action] * matsubara to chase mwhudson/jml about failure on updatebranches script
57 <MootBot> ACTION received: * matsubara to chase mwhudson/jml about failure on updatebranches script
58 <rockstar> matsubara, jml is here right now. :)
59 <matsubara> jml, I'll get you an url for the scripts after the meeting. I need to trawl my emails to find it
60 <jml> matsubara, ok. thanks.
61 <matsubara> stub, how's 354593 fix coming along?
62 <flacoste> why is this High again?
63 <matsubara> I wonder if mars had time to look over OOPS-1307J16
64 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
65 <matsubara> flacoste, do you know ^?
66 <flacoste> hmm, i put it as such
67 <flacoste> any reason it should be?
68 <matsubara> flacoste, according to the bug history you made it high :-)
69 <flacoste> debranding of the SSO is a U1/ISD affair anyway
70 <stub> matsubara: Slow. I need to discuss with people how to actually do it - maybe next week on the sprint if I get time.
71 * sinzui agrees with flacoste
72 <flacoste> stub: i think we should try to get stu and James to do it :-)
73 <flacoste> especially, stu, it would be a test good case for transfer knowledge
74 <stub> Anything that means I don't have to work out how ZPT macros works is fine by me.
75 <flacoste> +1
76 <matsubara> Ursinha, what's up with "Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606"?
77 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
78 <Ursinha> matsubara, the ExpatErrors were being discussed by mars and gary
79 <gary_poster> matsubara: that;s now registry. it actualy is a legitimate oops
80 <matsubara> [action] stub to delegate bug 354593 to ISD
81 <ubottu> Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593
82 <MootBot> ACTION received: stub to delegate bug 354593 to ISD
83 <gary_poster> it indicates a problem with mailman integration
84 <sinzui> I will ask barry to look into bug 403606
85 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
86 <matsubara> stub, You recently fixed a DisconnectionError bug. was it related to the errors you discussed with mars? that action item is now done?
87 <matsubara> thanks sinzui and gary_poster
88 <gary_poster> :-)
89 <stub> matsubara: I landed code to log OOPS reports on DisconnectionError before retrying the request. Is that what you mean?
90 <matsubara> stub, I mean: "* mars and stub to discuss the Disconnection and OperationalErrors after the meeting"
91 <Ursinha> stub, is that what caused the TransactionRollbackError oopses?
92 <stub> We discussed. I don't recall much about the conversation though :)
93 <matsubara> :-)
94 <stub> Ursinha: That fix was, yes. I've got another branch that turns the volume down so we don't log the TransactionCommitError's
95 <matsubara> [action] sinzui to ask barry to fix bug 403606
96 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
97 <MootBot> ACTION received: sinzui to ask barry to fix bug 403606
98 <Ursinha> stub, good, I filed bug 409907 for that
99 <ubottu> Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907
100 <matsubara> Ursinha, is there a bug for OOPS-1307J16?
101 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
102 <Ursinha> matsubara, not that I opened one, because we needed to know what was going on over there
103 <Ursinha> to open the bug
104 <Ursinha> so mars was going to investigate that
105 <Ursinha> I don't recall having those anymore
106 <matsubara> [action] ursinha to chase mars about OOPS-1307J16 and file a bug about it
107 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
108 <MootBot> ACTION received: ursinha to chase mars about OOPS-1307J16 and file a bug about it
109 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
110 <matsubara> I think that's all for last meeting's action items
111 <matsubara> thanks everyone
112 <matsubara> [TOPIC] * Oops report & Critical Bugs & Broken scripts
113 <MootBot> New Topic: * Oops report & Critical Bugs & Broken scripts
114 <Ursinha> there are two issues to discuss
115 <Ursinha> one was about bug 409907, that I already mentioned to stub and it's being handled
116 <ubottu> Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907
117 <Ursinha> the other is about the select replication_lag() timeouts we're having
118 <Ursinha> mthaddon also reported problems that we don't know if are related to that
119 <Ursinha> I don't know if there's much to be discussed at this point, because it seems we need to fix oops reports first to be able to see the real problem here
120 <Ursinha> is that correct stub:
121 <Ursinha> ?
122 * noodles775 has quit (Read error: 54 (Connection reset by peer))
123 <matsubara> should we request a CP for the branch that fixes the oops log?
124 <intellectronica> given that we're skipping a release, that's probably a good idea
125 <Ursinha> flacoste, I've spoken with jtv yesterday about those,and he also said that was unlikely to be his changes fault (possible but unlikely)
126 <stub> I landed code today that should tell us more about if the timeout is actually occuring due to blocking on the database, or elsewhere.
127 <Ursinha> s/his/translations/
128 <Ursinha> stub, should we request a CP?
129 <flacoste> yeah, i really think a CP is a good idea
130 <Ursinha> (please please)
131 <matsubara> [action] stub to request CP for his branch that fixes oops logging
132 <MootBot> ACTION received: stub to request CP for his branch that fixes oops logging
133 <Ursinha> cool
134 <Ursinha> we have two critical bugs, already fix committed
135 <Ursinha> so, good
136 <matsubara> cool
137 <Ursinha> about the failing scripts
138 <matsubara> we had some scripts failing this week
139 <matsubara> nightly, productreleasefinder and garbo-hourly
140 <matsubara> and rosetta-poimport too
141 <matsubara> nightly was already addressed by jtv
142 <Ursinha> matsubara, productreleasefinder isn't expected to fail anymore? sinzui?
143 <matsubara> as a rosetta script was taking too much time and jtv will remove it from nightly and add a cronjob for it
144 <sinzui> Ursinha: no, but the errors is see are not failures...the script was not run
145 <matsubara> stub, do you know why garbo-hourly is failing?
146 <stub> Its failing?
147 <sinzui> matsubara: many scripts are not running because of one log process
148 <matsubara> henninge, rosetta-poimport failed on the 5th. can you investigate and reply to the list?
149 <sinzui> s/log/long/
150 <Ursinha> matsubara, it's not being run, it seems
151 * noodles775 (n=miken@canonical/launchpad/noodles775) has joined #launchpad-meeting
152 <henninge> matsubara: sure, I will.
153 <matsubara> stub, I got a few emails: "Scripts failed to run: loganberry:garbo-hourly"
154 <sinzui> Ursinha: matsubara there is some traffic about this. spm reported the long running prcess a weeks ago. I has asked why the prf had not run
155 <matsubara> and no replies to the list, so I'm asking here
156 <matsubara> thanks henninge
157 <Ursinha> matsubara, actually stub repklied
158 <Ursinha> *replied
159 <stub> Oh - there were some blocked runs because the rosetta export-to-branch script was running in a 5 hour long transaction
160 <stub> So the script blocks because it doesn't want to make anything worse.
161 <matsubara> [action] henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list
162 <MootBot> ACTION received: henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list
163 * salgado is now known as salgado-lunch
164 <Ursinha> so I guess it's ok
165 <Ursinha> that's all for this section
166 <Ursinha> from me
167 <Ursinha> thanks everyone
168 <Ursinha> !
169 <matsubara> all right. thanks everyone
170 <Ursinha> you can move on matsubara
171 <matsubara> [TOPIC] * Operations report (mthaddon/herb/spm)
172 <MootBot> New Topic: * Operations report (mthaddon/herb/spm)
173 <herb> 2009-07-31 - Rolled out r8323 to bzrsyncd
174 <herb> 2009-08-05 - Cherry picks for code imports, lpnet* and the script server.
175 <herb> Our monitoring system has been timing out in connecting to the app servers more often this week. Admittedly its timeout is set lower than the OOPS timeout. But we've also been noticing higher load on the app servers as well. This was discussed by Ursinha during the oops/critical bugs/broken scripts section.
176 <herb> There's currently 1 cherry pick and 1 database query awaiting (dis)approval.
177 <herb> The LOSAs currently have 14 bugs marked high and triaged. Only 1 of which is assigned to someone and targeted for a release. We would be grateful if we saw some movement on these.
178 <herb> We're currently running with a single slave in preparation for the sprint next week.
179 <mthaddon> also wanted to check that there should be a cherry pick request for the cowboyed storm change to lpnet9 and lpnet10 (per the production status wiki page)
180 <flacoste> cowboyed storm change?
181 <mthaddon> flacoste: https://pastebin.canonical.com/20503/ under eggs/storm-0.14salgado_storm_launchpad_288_308-py2.4-linux-i686.egg
182 <flacoste> mthaddon, herb: i'll look at the LPS to approve/decline
183 <flacoste> right
184 <matsubara> herb, do you keep that list of 14 bugs somewhere? in a wiki page or have a tag to group them?
185 <flacoste> mthaddon: the cherry pick would simply be to update that dependency
186 <herb> matsubara: bugs.launchpad.net/~canonical-losas
187 <mthaddon> flacoste: well in any case, the CP that was requested (and performed) yesterday overwrote it, so it needs to be formalised so other CPs don't overwrite it again
188 <flacoste> sinzui: can salgado makes an appropriate CP request?
189 <sinzui> Yes
190 <flacoste> it's simply a new upload to download-cache with a versions.cfg change
191 <matsubara> sinzui, flacoste, intellectronica, rockstar: Could you take a look at herb's bug list (bugs.launchpad.net/~canonical-losas) and see what your teams can do about the high ones in the short term?
192 <flacoste> ok
193 <herb> clearly we're not looking for all of them to be fixed by the next meeting (though that would be great ;)
194 <herb> just mostly would like to know they're staying on the right radars and are being worked on as appropriate.
195 <matsubara> cool
196 <matsubara> anything else for herb?
197 <intellectronica> herb: so, basically, these are mostly bugs which will make life easier for you when fixed?
198 <sinzui> bug 348722 should become invalid when we update all pmt teams to become true private teams
199 <ubottu> Launchpad bug 348722 in launchpad-code "Set default branch visibility to "forbidden" if any team set to 'Private'" [High,Triaged] https://launchpad.net/bugs/348722
200 <herb> intellectronica: some of them are geniune operational issues, some of them are quality of life issues for the LOSAs
201 <sinzui> There should be no private-membership teams at the start of week 1
202 <intellectronica> cool, sure, we'll take a look and see if there's any low hanging fruit
203 <sinzui> barry will be working with the losas on August 11 to fix bug 325962
204 <ubottu> Launchpad bug 325962 in launchpad-registry "lp-mailman startup is blocking on a pid file in the wrong directory" [High,Triaged] https://launchpad.net/bugs/325962
205 <herb> sinzui: that was the one that was assgned and targetted at a release.
206 <sinzui> herb, many times
207 <herb> assigned even
208 <herb> heh
209 <matsubara> all right. I think that's it
210 <sinzui> herb it failed my rules that bug is not high if it is not worked on by all parties in 3 months
211 <herb> thanks
212 <matsubara> thanks herb and everyone
213 <matsubara> [TOPIC] * DBA report (stub)
214 <MootBot> New Topic: * DBA report (stub)
215 <stub> We set off some alerts when the poimport script and PostgreSQL decided that lots of disk space should be used. We see some smaller spikes, which is just PG using disk to store intermediary results, but this time it was large enough to set of the alarms.
216 <stub> We have seen this once before, and in neither case have we been able to repeat it. My best hypothesis is the planner statistics triggering a really bad query plan, so I'll bump the planner statistic sample size on the production dbs in case this stops future occurances.
217 <matsubara> henninge, maybe the last rosetta-poimport failure was related to that ^
218 <henninge> matsubara: I believe we already know what it was about and it may be related to that.
219 <henninge> matsubara: I'll talk to the guys.
220 <matsubara> henninge, cool. thanks
221 <matsubara> stub, anything else?
222 <stub> Not that I can think of
223 <matsubara> all right. thank you stub
224 <matsubara> I guess that's all for today
225 <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs.
226 <matsubara> #endmeeting
227 <MootBot> Meeting finished at 10:44.