DevelopmentMeeting20090226

Not logged in - Log In / Register

   1 <matsubara> #startmeeting
   2 <MootBot> Meeting started at 09:00. The chair is matsubara.
   3 <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
   4 <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
   5 <matsubara> [TOPIC] Roll Call 
   6 <MootBot> New Topic:  Roll Call
   7 <matsubara> Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us! 
   8 <henninge> me
   9 <Ursinha> me
  10 <matsubara> Ursinha, flacoste, bigjools, intellectronica, herb
  11 <bigjools> me
  12 <herb> me
  13 <matsubara> bac, ping
  14 <flacoste> me
  15 <Ursinha> matsubara, already answered
  16 <intellectronica> me
  17 <matsubara> rockstar, hi
  18 <rockstar> me
  19 <rockstar> matsubara, hi
  20 <bac> me
  21 <matsubara> ok, stub can join later. everyone else is here.
  22 <matsubara> [TOPIC] Agenda 
  23 <MootBot> New Topic:  Agenda
  24 <matsubara>  * Actions from last meeting
  25 <matsubara>  * Oops report & Critical Bugs 
  26 <matsubara>  * Operations report (mthaddon/herb/spm)
  27 <matsubara>  * DBA report (DBA contact)
  28 <matsubara> [TOPIC] * Actions from last meeting
  29 <MootBot> New Topic:  * Actions from last meeting
  30 <matsubara>  * stub to investigate the fix to avoid staging restore problems
  31 <matsubara>  * matsubara to chase rockstar about a fix for OOPS-1138CEMAIL12
  32 <matsubara>     * asked jml about this. It's bug 326056 and had importance raised.
  33 <matsubara>  * cprov and bigjools to investigate OOPS-1145EA14
  34 <matsubara>  * Ursinha to file bugs:
  35 <matsubara>     * Bug 333072: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1143EB189
  36 <matsubara>     * Bug 333071: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1145EA14
  37 <ubottu> Launchpad bug 326056 in launchpad-bazaar "OOPS on BadStateTransition when reviewing code by mail" [High,Triaged] https://launchpad.net/bugs/326056
  38 <ubottu> Launchpad bug 333072 in soyuz "AttributeError OOPS on Build:+index" [Undecided,Invalid] https://launchpad.net/bugs/333072
  39 <ubottu> Launchpad bug 333071 in soyuz "AssertionError OOPS on +copy-packages" [High,Triaged] https://launchpad.net/bugs/333071
  40 <bigjools> 333072 is invalid
  41 <matsubara> bigjools, any news about 333071?
  42 <bigjools> yes, it's not too serious, we've set it for 2.2.3
  43 <bigjools> it's a corner case in the copying
  44 <bigjools> despite the doom-mongering error message
  45 <matsubara> ok. thanks bigjools 
  46 <matsubara> [action] matsubara to chase stub about staging restore problems
  47 <MootBot> ACTION received:  matsubara to chase stub about staging restore problems
  48 <matsubara> [TOPIC] * Oops report & Critical Bugs 
  49 <MootBot> New Topic:  * Oops report & Critical Bugs
  50 * matsubara hands Ursinha the mic
  51 * Ursinha looks
  52 * rockstar runs
  53 <Ursinha> registry, foundations, code and bugs: oopses for you
  54 <Ursinha> Registry:-
  55 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153E919
  56 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153A1135 (or foundations, not sure)
  57 <Ursinha> Foundations:-
  58 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153D667
  59 <Ursinha> Code:
  60 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
  61 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
  62 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1152XMLP1
  63 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
  64 <Ursinha> Bugs:
  65 <Ursinha> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1152EA162
  66 <Ursinha> ~
  67 <Ursinha> rockstar, ha!
  68 <Ursinha> rockstar, have you seen this one: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1152XMLP1?
  69 <rockstar> Ursinha, looking at all of them now.
  70 <Ursinha> rockstar, you can just look at code's one :)
  71 <Ursinha> sinzui, hi
  72 <Ursinha> sinzui, I'm not sure if https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153A1135 is foundations or registry
  73 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
  74 <matsubara> Ursinha, looks like registry
  75 <intellectronica> Ursinha: strange. do you see lots of those?
  76 <bac> Ursinha: yes, looks like registry
  77 <Ursinha> intellectronica, no, actually not
  78 <Ursinha> intellectronica, but never saw one of those before
  79 <Ursinha> so better bring to attention
  80 <matsubara> intellectronica, Ursinha that one looks like caused by the rollout
  81 <sinzui> Ursinha I don't know the answer either. I will look into it and assign it. I suspect salgado-afk is working on
  82 <Ursinha> matsubara, even in the time it happened?
  83 <intellectronica> matsubara: i also thought so
  84 * salgado-afk is now known as salgado
  85 <intellectronica> but it is quite early
  86 <Ursinha> intellectronica, I've discarded the rollout possibility because of its timestamp
  87 <Ursinha> sinzui, thanks for that
  88 <matsubara> yeah, too early to be caused by the rollout. 
  89 <Ursinha> intellectronica, can you take a look then, please?
  90 <rockstar> Ursinha, I'll have to investigate our oops.  It's the XML-RPC server, and it requires the sacrifice of a virgin goat.
  91 <matsubara> check OSAs incident log to see if something happened during that time
  92 <intellectronica> so, this isn't really a bugs oops, but i don't know whether it's rollout-related or not. fwiw it's more than three hours before rollout, so it's hard to see how it would be related
  93 <Ursinha> rockstar, oh, I have a bunch here in my backyard if you need some
  94 <rockstar> Ursinha, :)
  95 <Ursinha> intellectronica, I'll do what matsubara suggested
  96 <matsubara> [action] ursinha to check OSAs incident log to help identify cause of OOPS-1152EA162
  97 <MootBot> ACTION received:  ursinha to check OSAs incident log to help identify cause of OOPS-1152EA162
  98 <Ursinha> thanks intellectronica and matsubara
  99 <matsubara> [action] rockstar to investigate xmlrpc oops OOPS-1152XMLP1
 100 <MootBot> ACTION received:  rockstar to investigate xmlrpc oops OOPS-1152XMLP1
 101 <Ursinha> flacoste, hi
 102 <henninge> Translations is happy, that POFile:+translate dropped from the timeout top ten now ..
 103 <henninge> btw
 104 <henninge> ;)
 105 <Ursinha> henninge, indeed, congrats to translate team :)
 106 * stub (n=stub@canonical/launchpad/stub) has joined #launchpad-meeting
 107 <Ursinha> translations
 108 <Ursinha> there he is :)
 109 <henninge> Ursinha: thank you, I will pass it on.
 110 <Ursinha> sinzui, about the other oops
 111 <stub> Sorry - on a call and didn't realize the time
 112 <sinzui> bac: can you look at it.
 113 <bac> Ursinha: they seem to be related (acting for sinzui today)
 114 * sinzui is in another meeting
 115 <flacoste> hmm
 116 <flacoste> i'd say registry
 117 * cumulus007 (n=sander@unaffiliated/cumulus007) has joined #launchpad-meeting
 118 <bac> yes, i think registry for both
 119 <flacoste> Ursinha are you talking about OOPS-1153A1135?
 120 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
 121 <Ursinha> bac, hi :) so, can you take a look in both oopses? do you need me to file bugs about them?
 122 * Ursinha looks
 123 <bac> Ursinha: yes i'll look at them both
 124 <bac> i can open the bugs
 125 <bac> unless you need the karma
 126 <Ursinha> flacoste, no, https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153D667
 127 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
 128 <Ursinha> bac, haha, no
 129 <flacoste> Ursinha: that's also a registry query
 130 <Ursinha> [action] bac to file bugs and take care of https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153E919 and https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1153A1135
 131 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
 132 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
 133 <matsubara> [action] bac to file bugs for OOPS-1153E919 and OOPS-1153A1135 
 134 <MootBot> ACTION received:  bac to file bugs for OOPS-1153E919 and OOPS-1153A1135
 135 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
 136 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
 137 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153E919
 138 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153A1135
 139 <bac> wow, y'all are insistent today!  :)
 140 <Ursinha> :)
 141 <Ursinha> flacoste, hm.
 142 <Ursinha> thanks
 143 <Ursinha> bac, can you take a look at that too?
 144 <bac> which?
 145 <Ursinha> promise not to paste the oops again
 146 * danilo-afk is now known as danilos
 147 <Ursinha> bac, https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
 148 <Ursinha> I tried :)
 149 * bac looks
 150 <bac> yes
 151 <Ursinha> bac, thanks
 152 <Ursinha> that's all from me from the oops land
 153 <matsubara> [action] bac to also file a bug and take care of OOPS-1153D667
 154 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
 155 <MootBot> ACTION received:  bac to also file a bug and take care of OOPS-1153D667
 156 <ubottu> https://devpad.canonical.com/~jamesh/oops.cgi/1153D667
 157 <matsubara> ok, thanks everyone.
 158 <matsubara> [TOPIC] * Operations report (mthaddon/herb/spm)
 159 <MootBot> New Topic:  * Operations report (mthaddon/herb/spm)
 160 <Ursinha> there's one critical bug, though
 161 <Ursinha> argh
 162 <Ursinha> bad bad timing
 163 <herb> shall I wait for the critical bug?
 164 <matsubara> danilo is handling the critical bug, so won't duplicate what's in the bug report.
 165 <Ursinha> herb, just a second, let me check with henninge
 166 <matsubara> it's bug 334787
 167 <Ursinha> matsubara, okay, if you say so
 168 <ubottu> Launchpad bug 334787 in rosetta "Ubuntu packagers are not translation editors (assertion error)" [Critical,In progress] https://launchpad.net/bugs/334787
 169 <matsubara> let's move on
 170 <Ursinha> go ahead herb, thanks
 171 <herb> 2009-02-20 - We had an issue that may have caused some users to experience intermittent outages on Launchpad. I worked with joey and flacosted to find the issue. joey's notes were sent to the list. I would be interested in hearing any updates we might have on this issue.
 172 <herb> 2009-02-21 and 2009-02-22 - It appears we had bit of buggy code land on edge that caused a performance problem on both edge and production. The revision was backed out and I believe the code has been fixed.
 173 <herb> 2009-02-26 - We rolled out 2.2.2 based on r7763
 174 <herb> We continue to see problems relating to bug #156453 and bug #118625. So much so that we're going to start bouncing codebrowse regularly to hopefully head off any issues. I want to emphasize that this will be masking the problem and we really do need to find the root cause and fix it.
 175 <ubottu> Launchpad bug 156453 in loggerhead "production loggerhead branch leaks memory" [Critical,Triaged] https://launchpad.net/bugs/156453
 176 <ubottu> Launchpad bug 118625 in launchpad-bazaar "codebrowse sometimes hangs" [High,Triaged] https://launchpad.net/bugs/118625
 177 <herb> Bug #260171 continues to creep up regularly (every few days). This is already morked as high and I know that mwhudson's plate is full with codebrowse issues, but can we get an update on this one?
 178 <ubottu> Bug 260171 on http://launchpad.net/bugs/260171 is private
 179 * herb somehow managed to change flacoste into a verb.
 180 <danilos> matsubara, Ursinha: I am running tests on the critical bug fix, will let you know once it has landed
 181 <flacoste> i saw!
 182 <bac> i've been flacosted!
 183 <matsubara> danilos, thanks
 184 <Ursinha> thanks danilos
 185 <matsubara> rockstar, can you bring up the codebrowse issue to the code team?
 186 <rockstar> matsubara, everyday.  :)
 187 <matsubara> rockstar, thanks :-)
 188 <rockstar> Codebrowse is being ACTIVELY worked on.  It'd be nice if we knew what the issues is.  Right now, we're just fixing things and hoping that was the problem.
 189 <herb> rockstar: let the losas know if there is anything we can do to help.
 190 <rockstar> herb, we certainly will.
 191 <stub> Should we be bringing in any outside help to intrument, test and diagnose the issue?
 192 <matsubara> herb, anything happened to the DB during the time of this OOPS-1152EA162?
 193 <matsubara> or maybe stub might know ^
 194 <herb> matsubara: nothing in the incident log.
 195 <stub> matsubara: That is one of the connection reaper scripts kicking in
 196 <herb> matsubara: I think that's also on the void between LOSAs.
 197 <herb> ah, there we go.
 198 <stub> We kill connections idle in a transaction more than a few hours (and should be more agressive), and appserver connections that have been in a transaction for more than 2 minutes.
 199 <Ursinha> stub, I see
 200 <matsubara> stub, ok. so if we start seeing too many of those, we have a problem somewhere and a few is kinda normal?
 201 <stub> The notification gets sent to the error-reports list (where we can confirm that this is indeed what happened)
 202 <matsubara> stub, aha. that's better. I'll chase the lp-errors for that one
 203 <matsubara> s/lp-errors/lp-errors list/
 204 <stub> If we see many of them, we have a problem. One is probably a problem - appserver requests taking two minutes on the db means we need to investigate why the normal timeout mechanisms didn't work.
 205 <matsubara> [action] matsubara to look lp-errors list to determine cause of OOPS-1152EA162
 206 <MootBot> ACTION received:  matsubara to look lp-errors list to determine cause of OOPS-1152EA162
 207 <matsubara> right. thanks for the explanation
 208 <stub> -1 second non-sql time, 0 seconds total time indicates a problem at the appserver? The request never got started?
 209 <matsubara> I'll file a bug about that one and we can discuss there
 210 <stub> hmm... might be a reconnection bug - perhaps the previous request handled by that thread got killed?
 211 <stub> I don't know if we Retry on DisconnectionError exceptions, or if it is a good idea in all cases.
 212 <matsubara> ok
 213 <matsubara> [TOPIC] * DBA report (stub)
 214 <MootBot> New Topic:  * DBA report (stub)
 215 <matsubara> and thanks herb and stub 
 216 <stub> New hardware exists and is being brought online by IS. I've realized I might need to tweak the db maintenance scripts (upgrade.py, security.py etc.) to cope with a third replica - I think it only copes with a single master and slave at the moment.
 217 <stub> Staging can be moved by the LOSAs as soon as the hardware is available and they have time, which will move that load from the production systems.
 218 <stub> I assume the rollout went fine as far as the db upgrade procedure goes.
 219 <herb> I assume it did too. I didn't hear any complaints from my colleagues.
 220 <matsubara> stub, great news! with the new hardware we won't have the staging restore problems anymore?
 221 <herb> stub: what's the plan with the 3rd replica?
 222 <stub> The staging restore problems should no longer be a problem.
 223 * herb feels like he missed something
 224 <stub> herb: We can start by pointing half the appservers at the new slave when it is online. We really should get a connection pool/load balancer thingy though running like pgbouncer, pgpool 1 or 2.
 225 <herb> stub: gotcha
 226 <stub> herb: I realized just now though that upgrade.py won't apply patches to a third replica, which would be bad. So that needs to be fixed.
 227 <herb> yeah. that's important.
 228 <stub> Or actually, slonik may take care of all that. I need to confirm anyway.
 229 <stub> I forget and it is too late for my brain :)
 230 <stub> erm... late as in evening
 231 <matsubara> all right. I guess that's all unless there are questions for stub
 232 <matsubara> thanks stub 
 233 <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See the channel topic for the location of the logs. 
 234 <matsubara> #endmeeting 
 235 <MootBot> Meeting finished at 09:42.
 236 <intellectronica> thanks matsubara
 237 <flacoste> hey
 238 <flacoste> matsubara: question
 239 <flacoste> do we need a new roll-out?
 240 <flacoste> and i think it applies to everyone here
 241 <matsubara> flacoste, I was on vacation and need t ocheck that
 242 <flacoste> anyone requires a new roll-out?
 243 <matsubara> but I think there's at least danilos' bug to re roll
 244 <bac> flacoste: i don't know of any issues for us
 245 <danilos> matsubara, flacoste: yes
 246 <stub> I thought it was policy to let enough bugs through qa to require a rerollout?
 247 <flacoste> we're getting better at QA stub
 248 <flacoste> even the code team weren't that late this cycle :-)
 249 <matsubara> ok, so we'll need a re-roll for translations. need to check for the other teams, but so far, there's nothing on the radar
 250 <stub> We need a counter somewhere - 'Launchpad has been running for n days without need to a release critical patch'
 251 <Ursinha> stub, :)
 252 <matsubara> I think that's all then. thanks everyone

DevelopmentMeeting20090226 (last edited 2009-02-26 19:43:43 by matsubara)