1 <matsubara> #startmeeting
2 <MootBot> Meeting started at 09:00. The chair is matsubara.
3 <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
4 <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
5 <matsubara> [TOPIC] Roll Call
6 <MootBot> New Topic: Roll Call
7 <matsubara> Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us!
8 <henninge> me
9 <Ursinha> me
10 <matsubara> Ursinha, flacoste, bigjools, intellectronica, herb
11 <bigjools> me
12 <herb> me
13 <matsubara> bac, ping
14 <flacoste> me
15 <Ursinha> matsubara, already answered
16 <intellectronica> me
17 <matsubara> rockstar, hi
18 <rockstar> me
19 <rockstar> matsubara, hi
20 <bac> me
21 <matsubara> ok, stub can join later. everyone else is here.
22 <matsubara> [TOPIC] Agenda
23 <MootBot> New Topic: Agenda
24 <matsubara> * Actions from last meeting
25 <matsubara> * Oops report & Critical Bugs
26 <matsubara> * Operations report (mthaddon/herb/spm)
27 <matsubara> * DBA report (DBA contact)
28 <matsubara> [TOPIC] * Actions from last meeting
29 <MootBot> New Topic: * Actions from last meeting
30 <matsubara> * stub to investigate the fix to avoid staging restore problems
31 <matsubara> * matsubara to chase rockstar about a fix for OOPS-1138CEMAIL12
32 <matsubara> * asked jml about this. It's bug 326056 and had importance raised.
33 <matsubara> * cprov and bigjools to investigate OOPS-1145EA14
34 <matsubara> * Ursinha to file bugs:
35 <matsubara> * Bug 333072: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1143EB189
36 <matsubara> * Bug 333071: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1145EA14
37 <ubottu> Launchpad bug 326056 in launchpad-bazaar "OOPS on BadStateTransition when reviewing code by mail" [High,Triaged] https://launchpad.net/bugs/326056
38 <ubottu> Launchpad bug 333072 in soyuz "AttributeError OOPS on Build:+index" [Undecided,Invalid] https://launchpad.net/bugs/333072
39 <ubottu> Launchpad bug 333071 in soyuz "AssertionError OOPS on +copy-packages" [High,Triaged] https://launchpad.net/bugs/333071
40 <bigjools> 333072 is invalid
41 <matsubara> bigjools, any news about 333071?
42 <bigjools> yes, it's not too serious, we've set it for 2.2.3
43 <bigjools> it's a corner case in the copying
44 <bigjools> despite the doom-mongering error message
45 <matsubara> ok. thanks bigjools
46 <matsubara> [action] matsubara to chase stub about staging restore problems
47 <MootBot> ACTION received: matsubara to chase stub about staging restore problems
48 <matsubara> [TOPIC] * Oops report & Critical Bugs
49 <MootBot> New Topic: * Oops report & Critical Bugs
50 * matsubara hands Ursinha the mic
51 * Ursinha looks
52 * rockstar runs
53 <Ursinha> registry, foundations, code and bugs: oopses for you
54 <Ursinha> Registry:-
55 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153E919
56 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153A1135 (or foundations, not sure)
57 <Ursinha> Foundations:-
58 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153D667
59 <Ursinha> Code:
60 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
61 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
62 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1152XMLP1
63 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
64 <Ursinha> Bugs:
65 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1152EA162
66 <Ursinha> ~
67 <Ursinha> rockstar, ha!
68 <Ursinha> rockstar, have you seen this one: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1152XMLP1?
69 <rockstar> Ursinha, looking at all of them now.
70 <Ursinha> rockstar, you can just look at code's one :)
71 <Ursinha> sinzui, hi
72 <Ursinha> sinzui, I'm not sure if https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153A1135 is foundations or registry
73 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
74 <matsubara> Ursinha, looks like registry
75 <intellectronica> Ursinha: strange. do you see lots of those?
76 <bac> Ursinha: yes, looks like registry
77 <Ursinha> intellectronica, no, actually not
78 <Ursinha> intellectronica, but never saw one of those before
79 <Ursinha> so better bring to attention
80 <matsubara> intellectronica, Ursinha that one looks like caused by the rollout
81 <sinzui> Ursinha I don't know the answer either. I will look into it and assign it. I suspect salgado-afk is working on
82 <Ursinha> matsubara, even in the time it happened?
83 <intellectronica> matsubara: i also thought so
84 * salgado-afk is now known as salgado
85 <intellectronica> but it is quite early
86 <Ursinha> intellectronica, I've discarded the rollout possibility because of its timestamp
87 <Ursinha> sinzui, thanks for that
88 <matsubara> yeah, too early to be caused by the rollout.
89 <Ursinha> intellectronica, can you take a look then, please?
90 <rockstar> Ursinha, I'll have to investigate our oops. It's the XML-RPC server, and it requires the sacrifice of a virgin goat.
91 <matsubara> check OSAs incident log to see if something happened during that time
92 <intellectronica> so, this isn't really a bugs oops, but i don't know whether it's rollout-related or not. fwiw it's more than three hours before rollout, so it's hard to see how it would be related
93 <Ursinha> rockstar, oh, I have a bunch here in my backyard if you need some
94 <rockstar> Ursinha, :)
95 <Ursinha> intellectronica, I'll do what matsubara suggested
96 <matsubara> [action] ursinha to check OSAs incident log to help identify cause of OOPS-1152EA162
97 <MootBot> ACTION received: ursinha to check OSAs incident log to help identify cause of OOPS-1152EA162
98 <Ursinha> thanks intellectronica and matsubara
99 <matsubara> [action] rockstar to investigate xmlrpc oops OOPS-1152XMLP1
100 <MootBot> ACTION received: rockstar to investigate xmlrpc oops OOPS-1152XMLP1
101 <Ursinha> flacoste, hi
102 <henninge> Translations is happy, that POFile:+translate dropped from the timeout top ten now ..
103 <henninge> btw
104 <henninge> ;)
105 <Ursinha> henninge, indeed, congrats to translate team :)
106 * stub (n=stub@canonical/launchpad/stub) has joined #launchpad-meeting
107 <Ursinha> translations
108 <Ursinha> there he is :)
109 <henninge> Ursinha: thank you, I will pass it on.
110 <Ursinha> sinzui, about the other oops
111 <stub> Sorry - on a call and didn't realize the time
112 <sinzui> bac: can you look at it.
113 <bac> Ursinha: they seem to be related (acting for sinzui today)
114 * sinzui is in another meeting
115 <flacoste> hmm
116 <flacoste> i'd say registry
117 * cumulus007 (n=sander@unaffiliated/cumulus007) has joined #launchpad-meeting
118 <bac> yes, i think registry for both
119 <flacoste> Ursinha are you talking about OOPS-1153A1135?
120 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
121 <Ursinha> bac, hi :) so, can you take a look in both oopses? do you need me to file bugs about them?
122 * Ursinha looks
123 <bac> Ursinha: yes i'll look at them both
124 <bac> i can open the bugs
125 <bac> unless you need the karma
126 <Ursinha> flacoste, no, https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153D667
127 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
128 <Ursinha> bac, haha, no
129 <flacoste> Ursinha: that's also a registry query
130 <Ursinha> [action] bac to file bugs and take care of https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153E919 and https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153A1135
131 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
132 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
133 <matsubara> [action] bac to file bugs for OOPS-1153E919 and OOPS-1153A1135
134 <MootBot> ACTION received: bac to file bugs for OOPS-1153E919 and OOPS-1153A1135
135 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
136 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
137 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
138 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
139 <bac> wow, y'all are insistent today! :)
140 <Ursinha> :)
141 <Ursinha> flacoste, hm.
142 <Ursinha> thanks
143 <Ursinha> bac, can you take a look at that too?
144 <bac> which?
145 <Ursinha> promise not to paste the oops again
146 * danilo-afk is now known as danilos
147 <Ursinha> bac, https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
148 <Ursinha> I tried :)
149 * bac looks
150 <bac> yes
151 <Ursinha> bac, thanks
152 <Ursinha> that's all from me from the oops land
153 <matsubara> [action] bac to also file a bug and take care of OOPS-1153D667
154 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
155 <MootBot> ACTION received: bac to also file a bug and take care of OOPS-1153D667
156 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
157 <matsubara> ok, thanks everyone.
158 <matsubara> [TOPIC] * Operations report (mthaddon/herb/spm)
159 <MootBot> New Topic: * Operations report (mthaddon/herb/spm)
160 <Ursinha> there's one critical bug, though
161 <Ursinha> argh
162 <Ursinha> bad bad timing
163 <herb> shall I wait for the critical bug?
164 <matsubara> danilo is handling the critical bug, so won't duplicate what's in the bug report.
165 <Ursinha> herb, just a second, let me check with henninge
166 <matsubara> it's bug 334787
167 <Ursinha> matsubara, okay, if you say so
168 <ubottu> Launchpad bug 334787 in rosetta "Ubuntu packagers are not translation editors (assertion error)" [Critical,In progress] https://launchpad.net/bugs/334787
169 <matsubara> let's move on
170 <Ursinha> go ahead herb, thanks
171 <herb> 2009-02-20 - We had an issue that may have caused some users to experience intermittent outages on Launchpad. I worked with joey and flacosted to find the issue. joey's notes were sent to the list. I would be interested in hearing any updates we might have on this issue.
172 <herb> 2009-02-21 and 2009-02-22 - It appears we had bit of buggy code land on edge that caused a performance problem on both edge and production. The revision was backed out and I believe the code has been fixed.
173 <herb> 2009-02-26 - We rolled out 2.2.2 based on r7763
174 <herb> We continue to see problems relating to bug #156453 and bug #118625. So much so that we're going to start bouncing codebrowse regularly to hopefully head off any issues. I want to emphasize that this will be masking the problem and we really do need to find the root cause and fix it.
175 <ubottu> Launchpad bug 156453 in loggerhead "production loggerhead branch leaks memory" [Critical,Triaged] https://launchpad.net/bugs/156453
176 <ubottu> Launchpad bug 118625 in launchpad-bazaar "codebrowse sometimes hangs" [High,Triaged] https://launchpad.net/bugs/118625
177 <herb> Bug #260171 continues to creep up regularly (every few days). This is already morked as high and I know that mwhudson's plate is full with codebrowse issues, but can we get an update on this one?
178 <ubottu> Bug 260171 on http://launchpad.net/bugs/260171 is private
179 * herb somehow managed to change flacoste into a verb.
180 <danilos> matsubara, Ursinha: I am running tests on the critical bug fix, will let you know once it has landed
181 <flacoste> i saw!
182 <bac> i've been flacosted!
183 <matsubara> danilos, thanks
184 <Ursinha> thanks danilos
185 <matsubara> rockstar, can you bring up the codebrowse issue to the code team?
186 <rockstar> matsubara, everyday. :)
187 <matsubara> rockstar, thanks :-)
188 <rockstar> Codebrowse is being ACTIVELY worked on. It'd be nice if we knew what the issues is. Right now, we're just fixing things and hoping that was the problem.
189 <herb> rockstar: let the losas know if there is anything we can do to help.
190 <rockstar> herb, we certainly will.
191 <stub> Should we be bringing in any outside help to intrument, test and diagnose the issue?
192 <matsubara> herb, anything happened to the DB during the time of this OOPS-1152EA162?
193 <matsubara> or maybe stub might know ^
194 <herb> matsubara: nothing in the incident log.
195 <stub> matsubara: That is one of the connection reaper scripts kicking in
196 <herb> matsubara: I think that's also on the void between LOSAs.
197 <herb> ah, there we go.
198 <stub> We kill connections idle in a transaction more than a few hours (and should be more agressive), and appserver connections that have been in a transaction for more than 2 minutes.
199 <Ursinha> stub, I see
200 <matsubara> stub, ok. so if we start seeing too many of those, we have a problem somewhere and a few is kinda normal?
201 <stub> The notification gets sent to the error-reports list (where we can confirm that this is indeed what happened)
202 <matsubara> stub, aha. that's better. I'll chase the lp-errors for that one
203 <matsubara> s/lp-errors/lp-errors list/
204 <stub> If we see many of them, we have a problem. One is probably a problem - appserver requests taking two minutes on the db means we need to investigate why the normal timeout mechanisms didn't work.
205 <matsubara> [action] matsubara to look lp-errors list to determine cause of OOPS-1152EA162
206 <MootBot> ACTION received: matsubara to look lp-errors list to determine cause of OOPS-1152EA162
207 <matsubara> right. thanks for the explanation
208 <stub> -1 second non-sql time, 0 seconds total time indicates a problem at the appserver? The request never got started?
209 <matsubara> I'll file a bug about that one and we can discuss there
210 <stub> hmm... might be a reconnection bug - perhaps the previous request handled by that thread got killed?
211 <stub> I don't know if we Retry on DisconnectionError exceptions, or if it is a good idea in all cases.
212 <matsubara> ok
213 <matsubara> [TOPIC] * DBA report (stub)
214 <MootBot> New Topic: * DBA report (stub)
215 <matsubara> and thanks herb and stub
216 <stub> New hardware exists and is being brought online by IS. I've realized I might need to tweak the db maintenance scripts (upgrade.py, security.py etc.) to cope with a third replica - I think it only copes with a single master and slave at the moment.
217 <stub> Staging can be moved by the LOSAs as soon as the hardware is available and they have time, which will move that load from the production systems.
218 <stub> I assume the rollout went fine as far as the db upgrade procedure goes.
219 <herb> I assume it did too. I didn't hear any complaints from my colleagues.
220 <matsubara> stub, great news! with the new hardware we won't have the staging restore problems anymore?
221 <herb> stub: what's the plan with the 3rd replica?
222 <stub> The staging restore problems should no longer be a problem.
223 * herb feels like he missed something
224 <stub> herb: We can start by pointing half the appservers at the new slave when it is online. We really should get a connection pool/load balancer thingy though running like pgbouncer, pgpool 1 or 2.
225 <herb> stub: gotcha
226 <stub> herb: I realized just now though that upgrade.py won't apply patches to a third replica, which would be bad. So that needs to be fixed.
227 <herb> yeah. that's important.
228 <stub> Or actually, slonik may take care of all that. I need to confirm anyway.
229 <stub> I forget and it is too late for my brain :)
230 <stub> erm... late as in evening
231 <matsubara> all right. I guess that's all unless there are questions for stub
232 <matsubara> thanks stub
233 <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See the channel topic for the location of the logs.
234 <matsubara> #endmeeting
235 <MootBot> Meeting finished at 09:42.
236 <intellectronica> thanks matsubara
237 <flacoste> hey
238 <flacoste> matsubara: question
239 <flacoste> do we need a new roll-out?
240 <flacoste> and i think it applies to everyone here
241 <matsubara> flacoste, I was on vacation and need t ocheck that
242 <flacoste> anyone requires a new roll-out?
243 <matsubara> but I think there's at least danilos' bug to re roll
244 <bac> flacoste: i don't know of any issues for us
245 <danilos> matsubara, flacoste: yes
246 <stub> I thought it was policy to let enough bugs through qa to require a rerollout?
247 <flacoste> we're getting better at QA stub
248 <flacoste> even the code team weren't that late this cycle :-)
249 <matsubara> ok, so we'll need a re-roll for translations. need to check for the other teams, but so far, there's nothing on the radar
250 <stub> We need a counter somewhere - 'Launchpad has been running for n days without need to a release critical patch'
251 <Ursinha> stub, :)
252 <matsubara> I think that's all then. thanks everyone