1 <matsubara> #startmeeting
2 <MootBot> Meeting started at 10:00. The chair is matsubara.
3 <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
4 <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
5 <matsubara> [TOPIC] Roll Call
6 <MootBot> New Topic: Roll Call
7 <sinzui> me
8 <matsubara> Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us!
9 <gary_poster> me
10 <Ursinha> me
11 <danilos> me
12 <matsubara> stub, cprov, herb, rockstar, intellectronica: hi
13 <cprov> me
14 <rockstar> ni!
15 <mthaddon> me
16 <intellectronica> me
17 <matsubara> hi mthaddon
18 <mthaddon> matsubara: herb won't be attending these meetings any more since he's no longer a LOSA
19 <Ursinha> it's true
20 <matsubara> mthaddon, indeed!
21 <matsubara> let me update the page
22 <Chex> hello!
23 <mthaddon> matsubara: most likely Chex will be his replacement (given he's on the same timezone that herb was on)
24 <Ursinha> hi Chex, welcome!
25 <matsubara> mthaddon, all right thanks
26 <matsubara> hi Chex, welcome
27 <Chex> all: thank you
28 <stub> moo
29 <matsubara> ok, everyone is here
30 <matsubara> [TOPIC] Agenda
31 <MootBot> New Topic: Agenda
32 <intellectronica> hi Chex, welcome
33 <matsubara> * Actions from last meeting
34 <matsubara> * Oops report & Critical Bugs & Broken scripts
35 <matsubara> * Operations report (mthaddon/herb/spm)
36 <matsubara> * DBA report (stub)
37 <matsubara> [TOPIC] * Actions from last meeting
38 <Ursinha> matsubara, you'll may want to s/flacoste/gary_poster in that page
39 <MootBot> New Topic: * Actions from last meeting
40 <matsubara> Ursinha, already done
41 <Ursinha> matsubara, thanks
42 <Andre_Gondim> me
43 <matsubara> * ursinha to chase mars about OOPS-1307J16 and file a bug about it
44 <matsubara> * matsubara to file a bug for OOPS-1315A253
45 <matsubara> * Filed https://launchpad.net/bugs/413706
46 <matsubara> * sinzui to file bugs for OOPS-1318S626, OOPS-1321EB223 and OOPS-1318EA4
47 <matsubara> * gary_poster to chase librarian-gc failure and report back to the list
48 <matsubara> * matsubara to ask stub to email the dba report to the list
49 <matsubara> * stub sent the dba report to the list
50 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
51 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1315A253
52 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1318S626
53 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1321EB223
54 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1318EA4
55 <ubottu> Ubuntu bug 413706 in launchpad-foundations "InvalidURIError using %s as the search term in the global search" [Undecided,New]
56 <matsubara> hi Andre_Gondim, welcome
57 <Andre_Gondim> thanks =]
58 <matsubara> hi sinzui, did you file those bugs?
59 <matsubara> Ursinha, no news about that oops? shall I remove the action item?
60 <Ursinha> matsubara, do that, I'll file a bug if that happens again
61 <matsubara> Ursinha, thanks
62 * sinzui has no screen
63 <matsubara> re: the librarian-gc failure, it was disabled that week, that's why we had a script failure email to the list
64 <gary_poster> stub is working on that as his next task
65 <mthaddon> I think there's a CP pending approval for that
66 <sinzui> matsubara: I did file bugs
67 <matsubara> gary_poster, mthaddon: cool. thanks
68 <stub> The next bit of work on the librarian may be related - depends on what happens with the cherry pick and test run ;)
69 <Ursinha> gary_poster, this is bug 410576, right?
70 <sinzui> OOPS-1315A253 is soyuz
71 <ubottu> Launchpad bug 410576 in launchpad-foundations "Librarian-gc discovered file missing from disk" [Critical,Triaged] https://launchpad.net/bugs/410576
72 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1315A253
73 <matsubara> sinzui, thanks. if you have them handy, could you priv msg them to me?
74 * gar0t0 (n=gar0t0@unaffiliated/gar0t0) has joined #launchpad-meeting
75 <sinzui> bug 413174
76 <ubottu> Launchpad bug 413174 in launchpad-registry "API AssertionError creating a release" [Low,Triaged] https://launchpad.net/bugs/413174
77 <gary_poster> Ursinha, that's not my understanding. hm, that's a dupe.
78 <Ursinha> gary_poster, a dupe? is there another?
79 <Ursinha> this one is set as Critical... I'll talk about it in the next section :)
80 <sinzui> matsubara: OOPS-1318EA4 is new. It relates to another bug that I intend to fix in 3.0 I will file and assign it
81 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1318EA4
82 <matsubara> thanks sinzui
83 <gary_poster> Ursinha: either dupe or related: bug 413749
84 <ubottu> Bug 413749 on http://launchpad.net/bugs/413749 is private
85 <Ursinha> gary_poster, let me see
86 <Ursinha> matsubara, you can move to the next section and we keep discussing there
87 <matsubara> ok, thanks Ursinha and gar0t0
88 <matsubara> err
89 <matsubara> gary_poster,
90 <gary_poster> :-)
91 <matsubara> [TOPIC] * Oops report & Critical Bugs & Broken scripts
92 <MootBot> New Topic: * Oops report & Critical Bugs & Broken scripts
93 <matsubara> there you go Ursinha
94 <Ursinha> okay
95 <Ursinha> +branches timeout has a fix already committed, and also that horrible 'specications' bug is fix committed as well
96 <Ursinha> so, two issues to ask: foundations and registry
97 <Ursinha> sinzui, I can see a lot of these ExpatErrors, that are bug 403606, does barry said something about fixing that?
98 <Ursinha> gary_poster, bug 410576 is Critical but I see there's no activity for almost a week now, is that really critical?
99 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [High,Triaged] https://launchpad.net/bugs/403606
100 <ubottu> Launchpad bug 410576 in launchpad-foundations "Librarian-gc discovered file missing from disk" [Critical,Triaged] https://launchpad.net/bugs/410576
101 <Ursinha> (in this meantime, I'll check bug 413749
102 <ubottu> Bug 413749 on http://launchpad.net/bugs/413749 is private
103 <Ursinha> )
104 * JoaoSantana (n=joao@200.165.133.50) has joined #launchpad-meeting
105 <gary_poster> Ursinha: I believe it is high: afaik, the criticality is what mthaddon describes in his comments to that issue. This is what stub is going to next.
106 <sinzui> Ursinha: barry has not provided any insight into the issue yet. I cannot estimate it
107 <stub> Its critical because it is part of the impending librarian collapse.
108 <sinzui> matsubara: bug #41648
109 <mthaddon> gary_poster: it's critical - LP will blow up in 20 days or so if it's not fixed (as the librarian will run out of space)
110 <ubottu> Launchpad bug 41648 in acpi "Sleep and hibernate fail on Acer Ferrari 3400" [Medium,Fix released] https://launchpad.net/bugs/41648
111 <matsubara> sinzui, hmm that doesn't look like a lp bug
112 <sinzui> matsubara: bug #416483
113 <ubottu> Launchpad bug 416483 in launchpad-registry "deletion of series and milestone must remove structural subscriptions" [High,Triaged] https://launchpad.net/bugs/416483
114 <matsubara> cool. thanks sinzui!
115 <sinzui> ^ points the the related bug too
116 <gary_poster> mthaddon, stub: (procedural, apologies) what does critical mean then? I thought it meant drop everything, while afaict this is a do it within 10 days?
117 <Ursinha> gary_poster, mthaddon, we have two bugs here, bug 410576 and bug 413749
118 <ubottu> Launchpad bug 410576 in launchpad-foundations "Librarian-gc discovered file missing from disk" [Critical,Triaged] https://launchpad.net/bugs/410576
119 <ubottu> Bug 413749 on http://launchpad.net/bugs/413749 is private
120 <Ursinha> gary_poster, that's my question as well
121 <mthaddon> gary_poster: I think if we know it's going to blow up all of LP in a short period of time, that's critical
122 <gary_poster> afaik 413749 is the (a?) symptom of 410576. stub, mthaddon, can you please correct me?
123 <stub> gary_poster: It is my top priority, as we need to know the genuine rate of disk consuption for the librarian so we can accurately predict when new disk has to be purchased and installed by, or soyuz has to decrease their consumption by
124 <gary_poster> stub thank you
125 <mthaddon> gary_poster: it's related, but fixing the librarian-gc will buy us more time, not fix it forever
126 <gary_poster> ok, gotcha
127 <gary_poster> So Ursinha, it is critical, and we should be moving to in progress, at least, within a day or so.
128 <Ursinha> great gary_poster, thanks a lot
129 <matsubara> Ursinha, anything else re: oops and critical bugs?
130 <Ursinha> sinzui, could you poke barry again about that bug? I can do that as well if you want :)
131 <sinzui> I will
132 <Ursinha> thanks a lot sinzui
133 <cprov> stub: we have to adjust the removal of BPRs to be more aggressive.
134 <danilos> cprov: can you (i.e. Soyuz team) provide data flacoste asked for in https://bugs.edge.launchpad.net/launchpad-foundations/+bug/413749 so we've got raw numbers there as well?
135 <ubottu> Error: This bug is private
136 <stub> cprov: Bug 413749 has a soyuz task, so you may want to triage it.
137 <cprov> danilos: sure, I can try.
138 <matsubara> garbo-hourly failed on the 17th even after spm adjusted the check to 12 hours. stub do you know what's up?
139 <mthaddon> cprov: any idea of how much space that would buy us?
140 <ubottu> Bug 413749 on http://launchpad.net/bugs/413749 is private
141 <stub> matsubara: I wasn't aware of that.
142 <cprov> mthaddon: can't tell exactly, but I issue the queries for estimating few other scenarios than 1 month quarantine for BPR files
143 <mthaddon> ok
144 <matsubara> there's a "Scripts failed to run: loganberry:garbo-hourly" email sent to the list on the 17th. could you investigate and reply to that email?
145 <matsubara> stub, ^
146 <Ursinha> cprov, can you follow up later on that bug then, please?
147 <cprov> Ursinha: sure
148 <Ursinha> thanks cprov
149 <matsubara> [action] cprov to follow up on bug 413749
150 <MootBot> ACTION received: cprov to follow up on bug 413749
151 <ubottu> Bug 413749 on http://launchpad.net/bugs/413749 is private
152 <ubottu> Bug 413749 on http://launchpad.net/bugs/413749 is private
153 <matsubara> [action] stub to investigate garbo-hourly failure after spm adjusted script checking to 12h
154 <MootBot> ACTION received: stub to investigate garbo-hourly failure after spm adjusted script checking to 12h
155 <matsubara> [action] sinzui to poke barry about ExpatError OOPSes (bug 403606)
156 <MootBot> ACTION received: sinzui to poke barry about ExpatError OOPSes (bug 403606)
157 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [High,Triaged] https://launchpad.net/bugs/403606
158 <sinzui> done
159 * sinzui eagerly awaits an assessment
160 <matsubara> cool
161 <matsubara> I think that's all for this section
162 <matsubara> thanks everyone
163 <Ursinha> thanks a bunch sinzui
164 <Ursinha> and everyone else :)
165 <Ursinha> do ahead matsubara
166 <Ursinha> *go
167 <matsubara> [TOPIC] * Operations report (mthaddon/Chex/spm)
168 <MootBot> New Topic: * Operations report (mthaddon/Chex/spm)
169 <danilos> mbarnett for the agenda as well? :)
170 <mthaddon> :)
171 <Chex> - Buildbot now hosted from the DC
172 <Chex> - Multiple Cherry Picks this past week
173 <Chex> - Will be beginning to implement recommendations from SplitIt Sprint before too long
174 <Chex> - Codebrowse needed restarting more than usual this week (see IncidentLog)
175 <Chex> - Incident with edge rollout breaking as one app server refused to stop, and interaction with the session DB being trashed - see Incident Report and most likely discussed earlier in the meeting
176 <Chex> - LOSA sprint this week to get new LOSAs (Chex, mbarnett) up to speed
177 <matsubara> danilos, good catch. thanks
178 <Chex> and thats it for us, unless there are any questions??
179 <gary_poster> yay buildbot in DC! :-)
180 <danilos> yeah, great stuff, looking forward to everything else that enables :)
181 <danilos> (like the production branch in buildbot *grin*)
182 <matsubara> thanks Chex
183 <matsubara> [TOPIC] * DBA report (stub)
184 <MootBot> New Topic: * DBA report (stub)
185 <stub> Our disk usage is going steadily up. Nothing alarming yet, but it did prompt me to turn on the long-running-transaction killer. Non-system transactions running over 3 hours will now be killed. This should alleviate database bloat, which adversely affects everything. It will also stop processes that block on long running transactions from blocking too long (like the garbo).
186 <stub> I've bumped up the default statistics target to 250. We have twice over the last several months had a query chewing up huge amounts of disk space in temporary tables, and my best guess as to why is bad query plans. The higher statistics target should make this less likely.
187 <stub> Done.
188 * Ursinha misses the oot thing
189 <Ursinha> questions for stub?
190 <danilos> stub: ok, so that means that fixing langpack exporter is now critical for us, right?
191 <stub> danilos: I can turn it off if necessary. I'm not sure what effect is has on the langpack export.
192 <stub> Will all of them be affected?
193 <stub> oot
194 <danilos> stub: most of the runs will
195 <Ursinha> hehe
196 <danilos> stub: I've made it critical for us, it should be a simple fix, it'll only require cherrypicking
197 <stub> danilos: ok. I'd like that issue raised to high or critical. I'll turn the check to 8 hours which will cover the current longest transaction I'm seeing in the graphs.
198 <stub> k
199 <danilos> stub: it was high and scheduled for 3.0, now it's scheduled for asap :)
200 <stub> Please add a note to the CP request that the limit needs to be put back.
201 <danilos> stub: sure, thanks
202 <Ursinha> thanks stub
203 <Ursinha> and danilos
204 <matsubara> thanks stub and danilos
205 <stub> danilos: Bug number?
206 <danilos> stub: bug 411697
207 <ubottu> Launchpad bug 411697 in rosetta "Language pack export has very long running transactions" [Critical,Triaged] https://launchpad.net/bugs/411697
208 <matsubara> * In-team handling of OOPSes (Danilo)
209 <danilos> ok, a long paste follows
210 * matsubara hands the mic to danilos
211 <danilos> Breaking news from the team leads call! Read all about it!
212 <danilos> Many of the duties Diogo and Ursula had you spoiled with (like trawling OOPS summaries and error logs and matching/filing relevant bugs) is what QA contacts in each team should do (generally, it was considered that this is what they should have been doing anyway).
213 <danilos> According to Gary, Diogo is happy to continue maintaining oops-tools (and relevant infrastructure, which will stay in Foundations turf), but everybody else is invited to contribute and take interest in the tools if they want something added.
214 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=tools
215 <danilos> Similarly, if someone finds it hard to go through numerous places to see all the possible problems (i.e. going through several OOPS summaries, error-reports list, etc), they are welcome to improve our infrastructure for aggregating these.
216 <danilos> I am personally hoping that once we pick a release manager for 3.0, (s)he'll take care that all QA contacts are on top of their game. Perhaps we can have Ursula and Diogo continue as is until RM for 3.0 is appointed.
217 <danilos> Any suggestions on what should change in the format of the meeting to make sure this is not a regression compared to what we do today?
218 <gary_poster> (eh, that summary came out in such a way that I feel I should have talked with matsubara first. sorry, matsubara, and feel free to correct the summary about your personal position)
219 <matsubara> gary_poster, it's correct :-)
220 <danilos> gary_poster: (I was just being careful not to put words in matsubara's mouth, I should have talked to him first, but there just wasn't the time between the teamleads call and this meeting :)
221 * gmb (n=gmb@i-83-67-31-25.freedom2surf.net) has joined #launchpad-meeting
222 <gary_poster> cool :-)
223 <danilos> anyway, how should the meetings be run from now on? matsubara, you want to keep running them?
224 <gary_poster> +1 if you are willing matsubara
225 <matsubara> danilos, yes, I talked to francis about it and Ursinha and I will still run the production meeting
226 <cprov> +1
227 <danilos> anybody else has any comments? everybody, this means more work for you and less for matsubara, Ursinha :)
228 <Ursinha> +1 from me
229 <stub> How to teams claim an oops? The benefit of a central monitor and this meeting is when teams disagree on who the problem belongs too.
230 <matsubara> but it'd be nice to have help from the QA contacts doing the daily oops analysis and help with triage
231 <danilos> stub: that's for the release manager to worry about IMO, but in general, we should be having bug attached to all the OOPSes
232 <stub> Who creates the bugs?
233 <Ursinha> danilos, that's the idea
234 <Ursinha> stub, it depends
235 <Ursinha> stub, for instance, afaik, translations has been creating its own bugs for some time now
236 <Ursinha> checking the summaries daily
237 <danilos> stub: in general, we might be able to improve tools to split summaries by vhost initially
238 <stub> I'm just wondering how we avoid them being dropped on the floor because, say, translations thinks an oops is a foundations issue and vice versa.
239 <Ursinha> danilos, matsubara has the idea of using page ids
240 <Ursinha> for splitting
241 <Ursinha> *had
242 <danilos> Ursinha: right, that might be a good one as well
243 <stub> splitting the reports into areas of responsibility would address my concern I think.
244 <danilos> Ursinha: actually, it's perfect
245 <cprov> okay, running the risk to sound like an idiot, who are the current QA contacts ? TLs ?
246 <Ursinha> cprov, the people that attend this meeting
247 <stub> TLs until they delegate ;)
248 <matsubara> cprov, everyone who attend this meeting weekly
249 <danilos> cprov: it means it's you! :)
250 <matsubara> cprov, actually it's bigjools, but he's away today
251 <cprov> fantastic! thanks.
252 <Ursinha> danilos, :P, bigjools actually
253 <danilos> heh, ok... in general, I think this is best done by a team lead
254 <danilos> (and soon enough, I'll be replacing henninge as the translations QA contact)
255 <Ursinha> danilos, it was TL's call when they pointed the QA contacts
256 <danilos> Ursinha: I know
257 <gary_poster> hm. question. if we *all* trawl oops, is that a collective time loss?
258 <Ursinha> but that can be changed for this new experiment
259 <Ursinha> gary_poster, if we separate per teams, not that much
260 <Ursinha> I believe
261 <danilos> so, matsubara, can we have an action for me to discuss with Ursinha and you how we can split OOPS reports into per-team summaries?
262 <Ursinha> per "teams"
263 <gary_poster> oh I see
264 <danilos> gary_poster: right, see above
265 <gary_poster> ok thanks
266 <matsubara> [action] danilos, Ursinha and matsubara to discuss oops summaries split per team
267 <MootBot> ACTION received: danilos, Ursinha and matsubara to discuss oops summaries split per team
268 <danilos> matsubara: thanks
269 <Ursinha> gary_poster, we in fact have a new feature on oops-tools that associate a bug to a exception type (matsubara correct me if I'm wrong here)
270 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=tools
271 <Ursinha> this helps a lot
272 <danilos> ubottu: thanks for nothing (just so you don't get used to praise only)
273 <ubottu> Error: I am only a bot, please don't think I'm intelligent :)
274 <Ursinha> sometimes you freak me out ubottu
275 <Ursinha> anyway
276 <Ursinha> :)
277 <danilos> anyway, that's all settled afaiac
278 <matsubara> gary_poster, Ursinha: now we have a feature on oops-tools that once an oops is linked to a bug, subsequent oopses of that same type are already linked to the bug report
279 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=tools
280 <Ursinha> gary_poster, if you click the oops, most of them have a bug associated, on top left
281 <danilos> we'll be reporting back, everything stays as is until we've got better oops reports, but do expect changes soon
282 <matsubara> makes analysis much easier
283 <Ursinha> bug report?
284 <matsubara> next step is to add that info to the summary
285 <gary_poster> heh. ah I see cool
286 <gary_poster> thanks Ursinha, matsubara
287 <matsubara> all right. thanks danilos for bringing this up
288 <Ursinha> ah, I got that
289 <matsubara> and thanks everyone
290 <Ursinha> thanks everyone
291 <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs.
292 <matsubara> #endmeeting