1 <MootBot> Meeting started at 10:00. The chair is matsubara.
2 <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
3 <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
4 <matsubara> [TOPIC] Roll Call
5 <MootBot> New Topic: Roll Call
6 <rockstar> me
7 <herb> me
8 <cprov> me
9 <sinzui> me
10 <matsubara> Ursinha:
11 * stub (n=stub@canonical/launchpad/stub) has joined #launchpad-meeting
12 <Ursinha> me
13 <stub> me (on the right server this time)
14 * flacoste (n=francis@canonical/launchpad/flacoste) has joined #launchpad-meeting
15 <danilos> me (if no call)
16 <flacoste> me
17 <matsubara> intellectronica: hi
18 <intellectronica> me
19 <matsubara> all right, everyone here
20 <matsubara> [TOPIC] Agenda
21 <MootBot> New Topic: Agenda
22 <matsubara> * Actions from last meeting
23 <matsubara> * Oops report & Critical Bugs
24 <matsubara> * Operations report (mthaddon/herb/spm)
25 <matsubara> * DBA report (stub)
26 <matsubara> [TOPIC] * Actions from last meeting
27 <MootBot> New Topic: * Actions from last meeting
28 <matsubara> * intellectronica to make efforts to take a look at bug 329908
29 <matsubara> * sinzui to talk to kiko about pending cp requests
30 <ubottu> Launchpad bug 329908 in malone "DownloadFailed OOPS when reporting a bug with apport (dup-of: 349646)" [Undecided,New] https://launchpad.net/bugs/329908
31 <ubottu> Launchpad bug 349646 in malone "apport uploads not being found in +filebug" [Undecided,Fix released] https://launchpad.net/bugs/349646
32 <intellectronica> matsubara: that's fixed
33 <matsubara> well, sinzui's one is not needed anymore since that's been released
34 <matsubara> thanks intellectronica
35 <sinzui> matsubara: I removed the requests because it was close to the rollout and the items were not critical
36 <matsubara> sinzui: sure. thanks for checking
37 <matsubara> moving on
38 <matsubara> [TOPIC] * Oops report & Critical Bugs
39 <MootBot> New Topic: * Oops report & Critical Bugs
40 * sinzui has a question about what is critical for unmaintaines app
41 * Notify: mthaddon is online (lindbohm.freenode.net).
42 * mthaddon (n=mthaddon@adsl-70-137-154-128.dsl.snfc21.sbcglobal.net) has joined #launchpad-meeting
43 <matsubara> Ursinha: ?
44 <Ursinha> me
45 <Ursinha> 4 bugs to talk about
46 * flacoste has quit (Read error: 104 (Connection reset by peer))
47 <Ursinha> matsubara wants to talk about bug 353530
48 <Ursinha> • bigjools, bug 347194, fixed as RC but still appears on lpnet
49 <Ursinha> • sinzui: bug 353863
50 <Ursinha> • bigjools, bug 353568, timeout at +source/package page
51 <ubottu> Launchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/353530
52 <matsubara> sinzui: good question. You mean blueprint stuff?
53 <ubottu> Launchpad bug 347194 in soyuz "IntegrityError: duplicate key value violates unique constraint "binarypackagerelease_binarypackagename_key"" [High,Fix committed] https://launchpad.net/bugs/347194
54 <ubottu> Launchpad bug 353863 in launchpad-registry "TypeError when finishing creating user account in lpnet" [Undecided,New] https://launchpad.net/bugs/353863
55 <ubottu> Launchpad bug 353568 in soyuz "ubuntu/source/package/+index timing out" [High,Triaged] https://launchpad.net/bugs/353568
56 <Ursinha> should we raise bug 353568 to critical?
57 <matsubara> sinzui: I think we need to raise that question in the list
58 <matsubara> cprov: what's up wit hteh ones bigjools fixed?
59 * flacoste (n=francis@canonical/launchpad/flacoste) has joined #launchpad-meeting
60 <flacoste> me again
61 <matsubara> hi francis
62 <flacoste> another X lock-up
63 <flacoste> what did i miss?
64 <matsubara> we're doing the oops section
65 <Ursinha> flacoste, the bugs we'll discuss
66 <sinzui> Ursinha: That looks like a critical bug to me
67 <matsubara> so far nothing for foundations
68 <cprov> matsubara: I don't know, AFAICT it's not fixed.
69 <sinzui> Ursinha: I will give it to salgado who is already looking into login/account issues
70 <Ursinha> sinzui, I couldn't reproduce that, don't know if matsubara tried that
71 <matsubara> those oopses are likely to be candidates for RC and next re-roll
72 <Ursinha> for sure
73 <matsubara> Ursinha: I did not
74 <Ursinha> thanks sinzui
75 <flacoste> what login/account issues are we having?
76 <sinzui> Ursinha: salgado saw many oopses he could not reproduce, but I think he can at least explain why
77 <cprov> matsubara: I will look at it this afternoon, maybe I can do something quick to stop the timeout in production
78 <Ursinha> flacoste, bug 353863
79 <ubottu> Launchpad bug 353863 in launchpad-registry "TypeError when finishing creating user account in lpnet" [Undecided,New] https://launchpad.net/bugs/353863
80 <salgado> I'll need help with this one
81 <matsubara> re: bug 353530, intellectronica could you take a look? it's about the OOPS in filing bug using the email interface but I'm not sure that scpecific oops is under Bugs responsability
82 <ubottu> Launchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/353530
83 <matsubara> cprov: cool. thanks
84 <intellectronica> matsubara: according to steve's comment that's another case of missing permissions
85 <intellectronica> but i'm not clear whether it was dealt with. i'll check
86 <matsubara> I'm going to add those to the CurrentRolloutBlockers page and use that page to coordinate things that will go in for the re-roll
87 <Ursinha> matsubara, afaik that was just fixed by adding the user to the conf file in the server
88 <matsubara> intellectronica: seems to be dealt with, but my question is more in the sense on how we can avoid that in the future
89 <Ursinha> as per spm explanations
90 <Ursinha> to me
91 <matsubara> so, apparently it was a unusual rollout requirement but nobody added it there
92 <matsubara> Ursinha: don't say server, we have at least 10 "servers" out there :-)
93 <Ursinha> matsubara, sorry :) s/server/server in which the conf was missing/
94 <matsubara> anyway, glancing at it, could be that the slaves were missing the right config?
95 <intellectronica> so it seems
96 <rockstar> matsubara, might that be a question for the db report section?
97 <flacoste> Ursinha, matsubara: we should add test for missing permission
98 <flacoste> matsubara: did you file a bug about the one you wanted me to discuss with stub?
99 <matsubara> flacoste: nope, but I have the pastebin here. I'll file a bug about it right after the meeting
100 <matsubara> [action] matsubara to file a bug about the missing select permissions that delayed the rollout
101 <MootBot> ACTION received: matsubara to file a bug about the missing select permissions that delayed the rollout
102 <flacoste> thanks
103 <matsubara> [action] cprov to look up soyuz bugs 347194, 353568
104 <ubottu> Launchpad bug 347194 in soyuz "IntegrityError: duplicate key value violates unique constraint "binarypackagerelease_binarypackagename_key"" [High,Fix committed] https://launchpad.net/bugs/347194
105 <MootBot> ACTION received: cprov to look up soyuz bugs 347194, 353568
106 <ubottu> Launchpad bug 353568 in soyuz "ubuntu/source/package/+index timing out" [High,Triaged] https://launchpad.net/bugs/353568
107 <cprov> matsubara: the first one is fixed
108 <matsubara> err, sorry about that, I'll edit that entry
109 <matsubara> [action] matsubara to edit #347194 out of the last action :-)
110 <MootBot> ACTION received: matsubara to edit #347194 out of the last action :-)
111 <cprov> matsubara: some errors happened yesterday because I had to reprocess a bunch binary uploads that failed after the rollout (due the absence of the launchpad_auth DB user)
112 <Ursinha> cprov, now it makes sense
113 <matsubara> ah, so that also affected other things other than the email interface.
114 <Ursinha> thanks :)
115 <cprov> Ursinha: yes, it was a nightmare, because the buildfarm was full and binaries could not be processed due to the lack of DB access
116 <matsubara> [action] matsubara to include francis suggestion to bug 353530 and ursinha to summarize what spm told her
117 <ubottu> Launchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/353530
118 <MootBot> ACTION received: matsubara to include francis suggestion to bug 353530 and ursinha to summarize what spm told her
119 <Ursinha> indeed
120 <matsubara> salgado: how can we help you with that one?
121 <salgado> matsubara, I'll let you know once I know. :)
122 <matsubara> [action] salgado to debug and fix bug 353863
123 <ubottu> Launchpad bug 353863 in launchpad-registry "TypeError when finishing creating user account in lpnet" [Undecided,New] https://launchpad.net/bugs/353863
124 <MootBot> ACTION received: salgado to debug and fix bug 353863
125 <matsubara> I think I addressed everything
126 <danilos> Ursinha: has there been any outcome of the timeout discussion?
127 <matsubara> so, as usual after the release we are going to monitor the oops reports constantly and coordinate with the teams about any new oopses
128 <Ursinha> danilos, I'm going to talk about it with stub in his section
129 <danilos> Ursinha: ok, thanks
130 <danilos> sorry for not following the script, I forgot my lines :)
131 <Ursinha> danilos, :)
132 <matsubara> [action] sinzui to email the list how we should address critical bugs on unmaintained apps (e.g. blueprint)
133 <MootBot> ACTION received: sinzui to email the list how we should address critical bugs on unmaintained apps (e.g. blueprint)
134 <matsubara> sinzui: ^ is that correct?
135 <sinzui> matsubara: yes
136 <matsubara> ok, I think that's all for this section. All the critical ones are being handled
137 <matsubara> thanks everyone
138 <matsubara> [TOPIC] * Operations report (mthaddon/herb/spm)
139 <MootBot> New Topic: * Operations report (mthaddon/herb/spm)
140 <herb> 2009-03-30 - Experienced some DB problems that affected the service. Launchpad was unavailable for approximately 9 minutes. stub sent out an email summarizing the issues.
141 <herb> 2009-03-30 - Cherry picked r8054 and part of r7999.
142 <herb> 2009-04-01 - Rollout of 2.2.3. Total downtime was approximately 100 minutes. I think there were a few hiccups on some DB permissions, but I haven't had an opportunity to catch up with mthaddon and spm on the details.
143 <herb> Bug 156453 and bug 118625 continue to be a source of discomfort. I think rockstar has an update on these though.
144 <ubottu> Launchpad bug 156453 in loggerhead "production loggerhead branch leaks memory" [Critical,In progress] https://launchpad.net/bugs/156453
145 <ubottu> Launchpad bug 118625 in launchpad-bazaar "codebrowse sometimes hangs" [High,Triaged] https://launchpad.net/bugs/118625
146 <herb> Bug 80895 and bug 119420 are a pain point for the LOSAs. I think something may have been scheduled for this cycle on this front. If so that's a total win from our point of view.
147 <herb> When do we think we'll be doing a re-roll?
148 <ubottu> Launchpad bug 80895 in malone "Give people five minutes to edit/delete their comment" [Undecided,Confirmed] https://launchpad.net/bugs/80895
149 <ubottu> Launchpad bug 119420 in launchpad-answers "Cannot edit a comment" [Medium,Triaged] https://launchpad.net/bugs/119420
150 <rockstar> herb, I can has update!
151 <rockstar> :)
152 <herb> woo!
153 <rockstar> So we have a memory middleware currently that's allowing us to track down memory issues.
154 <rockstar> herb, also, mwhudson and jam have been writing a C-based memory profiler as well, so we can track refs even better in bzrlib itself.
155 <herb> excellent
156 <matsubara> herb: I'll let you know about the re-roll once we know. :-)
157 <herb> matsubara: appreciated.
158 <rockstar> herb, unfortunately, I can't really tell if the "sometimes hangs" bug is related to the "leaks memory" bug.
159 <matsubara> herb: re: the DB permission, I'm going to file a bug about it and flacoste and stub will discuss it :-)
160 <herb> rockstar: I suspect so, but fixing the memory issue would be a huge win.
161 <stub> its not a bug, it was an operational issue
162 <Ursinha> indeed
163 <rockstar> herb, yes. If they are unrelated, it's probably a bug in one of our dependencies.
164 <stub> erm... if you are talking about the same one i'm thinking off.
165 <matsubara> stub: I'm talking about the permission for the SSO user
166 <stub> ok. different ;)
167 <matsubara> :-)
168 <matsubara> ok, anything else for herb?
169 <matsubara> thanks herb.
170 <herb> thanks matsubara
171 <matsubara> and thank mthaddon and spm for the handling the rollout so well too!
172 <matsubara> moving on.
173 <herb> matsubara: will do
174 <matsubara> [TOPIC] * DBA report (stub)
175 <MootBot> New Topic: * DBA report (stub)
176 <stub> Todays Database update ran in about 100 mins with all replicas enabled. Earlier calculations indicated the downtime would be a bit under three hours. The discrepancy is staging isn't as powerful and normal staging operations are underway during the restore.
177 <stub> This was good from a downtime perspective, but does mean we can no longer get reliable rollout timings from staging. When rollout times are a concern, we might have to test the database upgrade process on a production server and calculate the time from there.
178 <stub> I want to switch our master database to the new 16 core box from the current 8 core box in the next two weeks. This will require a few minutes downtime - I think a scheduled 10 minute outage will suffice. We might want to double up if there is other downtime required in the near future.
179 <stub> A few days ago, generating a table bloat report managed to mess up PostgreSQL, causing all queries to the master to generate nothing but errors. A forced restart was required, causing a few minutes of downtime total The cause has been tracked down and is being worked on upstream, and we can avoid it now we know what it is (don't feed temporary tables to pgstattuple).
180 <stub> I've opened a couple of bugs about batch jobs that are taking too long. I generally don't care how long things take as long as their impact is light, but staging updates and post rollout processes are approaching 24 hours...
181 <stub> A number of problems where caused by missing PostgreSQL authorization to the new launchpad_auth user on production. This authorization was added to staging, but missed getting into the production rollout tasks. spm sorted it a few hours after the rollout as I understand it. This is a purely operational issue outside the scope of our test suite (staging is the test bed for database connection authorizations). Ignore OOPSes and bugs like 353
182 <stub> All from me.
183 <stub> Bug 353530
184 <ubottu> Launchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/353530
185 <Ursinha> stub, I have one oops, I don't know if it was just a hiccup
186 <Ursinha> stub, https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1188D1214
187 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1188D1214
188 <matsubara> [action] matsubara to talk to mrevell to announce a maintenance in the DB for about 10 min outage in the next 2 weeks. ask mrevell to talk to stub about it
189 <MootBot> ACTION received: matsubara to talk to mrevell to announce a maintenance in the DB for about 10 min outage in the next 2 weeks. ask mrevell to talk to stub about it
190 <stub> Ursinha: Thats a bug needing fixing.
191 <Ursinha> stub, I'll file a bug about it now
192 <Ursinha> about the timeouts we mentioned during the week
193 <Ursinha> it seems they indeed dropped
194 <Ursinha> the major responsible now is the source package index page
195 <Ursinha> danilos, ^
196 <stub> Ok. So we need to be even less aggressive doing mass data migration.
197 <Ursinha> if the timeouts continue the next days, we'll have to chase another cause.
198 <danilos> stub, Ursinha: we'll have something similar coming up, how can we make sure the impact is not felt on our production machines?
199 <stub> danilos: Either set the acceptable lag setting lower, or a cooldown time after each batch.
200 <herb> stub: or both?
201 <danilos> stub: ok, I guess we'll have to experiment with these
202 <stub> or both
203 <matsubara> ok. I guess that's all for stub?
204 <matsubara> thanks stub
205 <Ursinha> thanks stub
206 <matsubara> I have a minor annoucement that I forgot to add to the agenda
207 <matsubara> Next week is our second performance week
208 <matsubara> so, please add the bugs you're going to work on in https://dev.launchpad.net/PerformanceWeeks/April2009
209 <matsubara> and I think that's all
210 <matsubara> anything else before I close?
211 <matsubara> 3
212 <matsubara> 2
213 <matsubara> 1
214 <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See the channel topic for the location of the logs.
215 <Ursinha> stub, bug 353897
216 <ubottu> Launchpad bug 353897 in launchpad-foundations "DisallowedStore OOPS in lpnet/+login" [Undecided,New] https://launchpad.net/bugs/353897
217 <matsubara> #endmeeting
218 <MootBot> Meeting finished at 10:39.