1 <matsubara> #startmeeting
   2 <MootBot> Meeting started at 10:00. The chair is matsubara.
   4 <Ursinha> roll call,roll call
   6 <sinzui> me
   8 <matsubara> hang on a second please
   9 <henninge> me
  13 <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues. 
  15 <matsubara> [TOPIC] Roll Call 
  16 <MootBot> New Topic:  Roll Call
  17 <Ursinha> meeee
  18 <matsubara> Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us! 
  19 <bigjools> me
  20 <stub> me
  21 <henninge> me again
  22 <intellectronica> i
  23 <matsubara> flacoste, hi
  24 <flacoste> me
  25 <matsubara> herb, hi
  26 <herb> me
  27 <matsubara> ok, everyone here.
  28 <matsubara> [TOPIC] Agenda 
  29 <MootBot> New Topic:  Agenda
  30 <matsubara>  * Actions from last meeting
  31 <matsubara>  * Oops report & Critical Bugs & Broken scripts
  32 <matsubara>  * Operations report (mthaddon/herb/spm)
  33 <matsubara>  * DBA report (stub)
  34 <matsubara> [TOPIC] * Actions from last meeting
  35 <MootBot> New Topic:  * Actions from last meeting
  36 <matsubara>  * matsubara to chase rockstar about failure on updatebranches script
  37 <matsubara>  * stub to give a try on bug 354593 with mars help if needed
  38 <matsubara>  * stub to fix bug 310818
  39 <matsubara>  * mars to take a look at OOPS-1307J16
  40 <matsubara>  * Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606
  41 <matsubara>  * mars and stub to discuss the Disconnection and OperationalErrors after the meeting
  42 <jml> me
  43 <ubottu> Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593
  44 <ubottu> Launchpad bug 310818 in launchpad-foundations "Oops report does not always log timed-out query" [High,In progress] https://launchpad.net/bugs/310818
  45 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
  46 <Ursinha> yay, jml
  47 <matsubara> I suck, I didn't chase rockstar about the updatebranches script failures
  48 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
  49 <rockstar> matsubara, I thought we agreed that mwhudson would be better to chase on it.
  50 <jml> matsubara, got a URL for the failure?
  51 <matsubara> otoh, the script is not failing anymore...
  52 <rockstar> matsubara, I know mwhudson was looking at it on his Tuesday.
  53 <rockstar> jml would be good to ask as well.
  55 <matsubara> rockstar, all right. I'll talk to jml and mwhudson later on today
  56 <matsubara> [action] * matsubara to chase mwhudson/jml about failure on updatebranches script
  57 <MootBot> ACTION received:  * matsubara to chase mwhudson/jml about failure on updatebranches script
  58 <rockstar> matsubara, jml is here right now. :)
  59 <matsubara> jml, I'll get you an url for the scripts after the meeting. I need to trawl my emails to find it
  60 <jml> matsubara, ok. thanks.
  61 <matsubara> stub, how's 354593 fix coming along?
  62 <flacoste> why is this High again?
  63 <matsubara> I wonder if mars had time to look over OOPS-1307J16
  64 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
  65 <matsubara> flacoste, do you know ^?
  66 <flacoste> hmm, i put it as such
  67 <flacoste> any reason it should be?
  68 <matsubara> flacoste, according to the bug history you made it high :-)
  69 <flacoste> debranding of the SSO is a U1/ISD affair anyway
  70 <stub> matsubara: Slow. I need to discuss with people how to actually do it - maybe next week on the sprint if I get time.
  71 * sinzui agrees with flacoste
  72 <flacoste> stub: i think we should try to get stu and James to do it :-)
  73 <flacoste> especially, stu, it would be a test good case for transfer knowledge
  74 <stub> Anything that means I don't have to work out how ZPT macros works is fine by me.
  75 <flacoste> +1
  76 <matsubara> Ursinha, what's up with  "Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606"?
  77 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
  78 <Ursinha> matsubara, the ExpatErrors were being discussed by mars and gary
  79 <gary_poster> matsubara: that;s now registry.  it actualy is a legitimate oops
  80 <matsubara> [action] stub to delegate bug 354593 to ISD
  81 <ubottu> Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593
  82 <MootBot> ACTION received:  stub to delegate bug 354593 to ISD
  83 <gary_poster> it indicates a problem with mailman integration
  84 <sinzui> I will ask barry to look into bug 403606
  85 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
  86 <matsubara> stub, You recently fixed a DisconnectionError bug. was it related to the errors you discussed with mars? that action item is now done?
  87 <matsubara> thanks sinzui and gary_poster 
  88 <gary_poster> :-)
  89 <stub> matsubara: I landed code to log OOPS reports on DisconnectionError before retrying the request. Is that what you mean?
  90 <matsubara> stub, I mean: "* mars and stub to discuss the Disconnection and OperationalErrors after the meeting"
  91 <Ursinha> stub, is that what caused the TransactionRollbackError oopses?
  92 <stub> We discussed. I don't recall much about the conversation though :)
  93 <matsubara> :-)
  94 <stub> Ursinha: That fix was, yes. I've got another branch that turns the volume down so we don't log the TransactionCommitError's
  95 <matsubara> [action] sinzui to ask barry to fix bug 403606
  96 <ubottu> Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606
  97 <MootBot> ACTION received:  sinzui to ask barry to fix bug 403606
  98 <Ursinha> stub, good, I filed bug 409907 for that
  99 <ubottu> Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907
 100 <matsubara> Ursinha, is there a bug for OOPS-1307J16?
 101 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
 102 <Ursinha> matsubara, not that I opened one, because we needed to know what was going on over there
 103 <Ursinha> to open the bug
 104 <Ursinha> so mars was going to investigate that
 105 <Ursinha> I don't recall having those anymore
 106 <matsubara> [action] ursinha to chase mars about OOPS-1307J16 and file a bug about it
 107 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
 108 <MootBot> ACTION received:  ursinha to chase mars about OOPS-1307J16 and file a bug about it
 109 <ubottu> https://lp-oops.canonical.com/oops.py/?oopsid=1307J16
 110 <matsubara> I think that's all for last meeting's action items
 111 <matsubara> thanks everyone
 112 <matsubara> [TOPIC] * Oops report & Critical Bugs & Broken scripts
 113 <MootBot> New Topic:  * Oops report & Critical Bugs & Broken scripts
 114 <Ursinha> there are two issues to discuss
 115 <Ursinha> one was about bug 409907, that I already mentioned to stub and it's being handled
 116 <ubottu> Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907
 117 <Ursinha> the other is about the select replication_lag() timeouts we're having
 118 <Ursinha> mthaddon also reported problems that we don't know if are related to that
 119 <Ursinha> I don't know if there's much to be discussed at this point, because it seems we need to fix oops reports first to be able to see the real problem here
 120 <Ursinha> is that correct stub:
 121 <Ursinha> ?
 123 <matsubara> should we request a CP for the branch that fixes the oops log?
 124 <intellectronica> given that we're skipping a release, that's probably a good idea
 125 <Ursinha> flacoste, I've spoken with jtv yesterday about those,and he also said that was unlikely to be his changes fault (possible but unlikely)
 126 <stub> I landed code today that should tell us more about if the timeout is actually occuring due to blocking on the database, or elsewhere.
 127 <Ursinha> s/his/translations/
 128 <Ursinha> stub, should we request a CP?
 129 <flacoste> yeah, i really think a CP is a good idea
 130 <Ursinha> (please please)
 131 <matsubara> [action] stub to request CP for his branch that fixes oops logging
 132 <MootBot> ACTION received:  stub to request CP for his branch that fixes oops logging
 133 <Ursinha> cool
 134 <Ursinha> we have two critical bugs, already fix committed
 135 <Ursinha> so, good
 136 <matsubara> cool
 137 <Ursinha> about the failing scripts
 138 <matsubara> we had some scripts failing this week
 139 <matsubara> nightly, productreleasefinder and garbo-hourly
 140 <matsubara> and rosetta-poimport too
 141 <matsubara> nightly was already addressed by jtv
 142 <Ursinha> matsubara, productreleasefinder isn't expected to fail anymore? sinzui?
 143 <matsubara> as a rosetta script was taking too much time and jtv will remove it from nightly and add a cronjob for it
 144 <sinzui> Ursinha: no, but the errors is see are not failures...the script was not run
 145 <matsubara> stub, do you know why garbo-hourly is failing?
 146 <stub> Its failing?
 147 <sinzui> matsubara: many scripts are not running because of one log process
 148 <matsubara> henninge, rosetta-poimport failed on the 5th. can you investigate and reply to the list?
 149 <sinzui> s/log/long/
 150 <Ursinha> matsubara, it's not being run, it seems
 152 <henninge> matsubara: sure, I will.
 153 <matsubara> stub, I got a few emails: "Scripts failed to run: loganberry:garbo-hourly" 
 154 <sinzui> Ursinha: matsubara there is some traffic about this. spm reported the long running prcess a weeks ago. I has asked why the prf had not run
 155 <matsubara> and no replies to the list, so I'm asking here
 156 <matsubara> thanks henninge 
 157 <Ursinha> matsubara, actually stub repklied
 158 <Ursinha> *replied
 159 <stub> Oh - there were some blocked runs because the rosetta export-to-branch script was running in a 5 hour long transaction
 160 <stub> So the script blocks because it doesn't want to make anything worse.
 161 <matsubara> [action] henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list
 162 <MootBot> ACTION received:  henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list
 164 <Ursinha> so I guess it's ok
 165 <Ursinha> that's all for this section
 166 <Ursinha> from me
 167 <Ursinha> thanks everyone
 168 <Ursinha> !
 169 <matsubara> all right. thanks everyone
 170 <Ursinha> you can move on matsubara
 171 <matsubara> [TOPIC] * Operations report (mthaddon/herb/spm)
 172 <MootBot> New Topic:  * Operations report (mthaddon/herb/spm)
 173 <herb> 2009-07-31 - Rolled out r8323 to bzrsyncd
 174 <herb> 2009-08-05 - Cherry picks for code imports, lpnet* and the script server.
 175 <herb> Our monitoring system has been timing out in connecting to the app servers more often this week. Admittedly its timeout is set lower than the OOPS timeout. But we've also been noticing higher load on the app servers as well. This was discussed by Ursinha during the oops/critical bugs/broken scripts section.
 176 <herb> There's currently 1 cherry pick and 1 database query awaiting (dis)approval.
 177 <herb> The LOSAs currently have 14 bugs marked high and triaged. Only 1 of which is assigned to someone and targeted for a release. We would be grateful if we saw some movement on these.
 178 <herb> We're currently running with a single slave in preparation for the sprint next week.
 179 <mthaddon> also wanted to check that there should be a cherry pick request for the cowboyed storm change to lpnet9 and lpnet10 (per the production status wiki page)
 180 <flacoste> cowboyed storm change?
 181 <mthaddon> flacoste: https://pastebin.canonical.com/20503/ under eggs/storm-0.14salgado_storm_launchpad_288_308-py2.4-linux-i686.egg
 182 <flacoste> mthaddon, herb: i'll look at the LPS to approve/decline
 183 <flacoste> right
 184 <matsubara> herb, do you keep that list of 14 bugs somewhere? in a wiki page or have a tag to group them?
 185 <flacoste> mthaddon: the cherry pick would simply be to update that dependency
 186 <herb> matsubara: bugs.launchpad.net/~canonical-losas
 187 <mthaddon> flacoste: well in any case, the CP that was requested (and performed) yesterday overwrote it, so it needs to be formalised so other CPs don't overwrite it again
 188 <flacoste> sinzui: can salgado makes an appropriate CP request?
 189 <sinzui> Yes
 190 <flacoste> it's simply a new upload to download-cache with a versions.cfg change
 191 <matsubara> sinzui, flacoste, intellectronica, rockstar: Could you take a look at herb's bug list (bugs.launchpad.net/~canonical-losas) and see what your teams can do about the high ones in the short term?
 192 <flacoste> ok
 193 <herb> clearly we're not looking for all of them to be fixed by the next meeting (though that would be great ;)
 194 <herb> just mostly would like to know they're staying on the right radars and are being worked on as appropriate.
 195 <matsubara> cool
 196 <matsubara> anything else for herb?
 197 <intellectronica> herb: so, basically, these are mostly bugs which will make life easier for you when fixed?
 198 <sinzui> bug 348722 should become invalid when we update all pmt teams to become true private teams
 199 <ubottu> Launchpad bug 348722 in launchpad-code "Set default branch visibility to "forbidden" if any team set to 'Private'" [High,Triaged] https://launchpad.net/bugs/348722
 200 <herb> intellectronica: some of them are geniune operational issues, some of them are quality of life issues for the LOSAs
 201 <sinzui> There should be no private-membership teams at the start of week 1
 202 <intellectronica> cool, sure, we'll take a look and see if there's any low hanging fruit
 203 <sinzui> barry will be working with the losas on August 11 to fix bug 325962
 204 <ubottu> Launchpad bug 325962 in launchpad-registry "lp-mailman startup is blocking on a pid file in the wrong directory" [High,Triaged] https://launchpad.net/bugs/325962
 205 <herb> sinzui: that was the one that was assgned and targetted at a release.
 206 <sinzui> herb, many times
 207 <herb> assigned even
 208 <herb> heh
 209 <matsubara> all right. I think that's it
 210 <sinzui> herb it failed my rules that bug is not high if it is not worked on by all parties in 3 months
 211 <herb> thanks
 212 <matsubara> thanks herb and everyone
 213 <matsubara> [TOPIC] * DBA report (stub)
 214 <MootBot> New Topic:  * DBA report (stub)
 215 <stub> We set off some alerts when the poimport script and PostgreSQL decided that lots of disk space should be used. We see some smaller spikes, which is just PG using disk to store intermediary results, but this time it was large enough to set of the alarms.
 216 <stub> We have seen this once before, and in neither case have we been able to repeat it. My best hypothesis is the planner statistics triggering a really bad query plan, so I'll bump the planner statistic sample size on the production dbs in case this stops future occurances.
 217 <matsubara> henninge, maybe the last rosetta-poimport failure was related to that ^
 218 <henninge> matsubara: I believe we already know what it was about and it may be related to that.
 219 <henninge> matsubara: I'll talk to the guys.
 220 <matsubara> henninge, cool. thanks
 221 <matsubara> stub, anything else?
 222 <stub> Not that I can think of
 223 <matsubara> all right. thank you stub 
 224 <matsubara> I guess that's all for today
 225 <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs. 
 226 <matsubara> #endmeeting 
 227 <MootBot> Meeting finished at 10:44.

