## Template for LP Production Meeting logs. Just paste xchat log below and the format IRC line will take care of formatting correctly #format IRC #startmeeting Meeting started at 10:00. The chair is matsubara. Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE] roll call,roll call my firefox died me * stub belches hang on a second please me poor matsubara * jml eavesdrops * bigjools wafts stub's belch away Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues. ni [TOPIC] Roll Call New Topic: Roll Call meeee Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us! me me me again i flacoste, hi me herb, hi me ok, everyone here. [TOPIC] Agenda New Topic: Agenda * Actions from last meeting * Oops report & Critical Bugs & Broken scripts * Operations report (mthaddon/herb/spm) * DBA report (stub) [TOPIC] * Actions from last meeting New Topic: * Actions from last meeting * matsubara to chase rockstar about failure on updatebranches script * stub to give a try on bug 354593 with mars help if needed * stub to fix bug 310818 * mars to take a look at OOPS-1307J16 * Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606 * mars and stub to discuss the Disconnection and OperationalErrors after the meeting me Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 Launchpad bug 310818 in launchpad-foundations "Oops report does not always log timed-out query" [High,In progress] https://launchpad.net/bugs/310818 https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 yay, jml I suck, I didn't chase rockstar about the updatebranches script failures Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 matsubara, I thought we agreed that mwhudson would be better to chase on it. matsubara, got a URL for the failure? otoh, the script is not failing anymore... matsubara, I know mwhudson was looking at it on his Tuesday. jml would be good to ask as well. * cprov is now known as cprov-lunch rockstar, all right. I'll talk to jml and mwhudson later on today [action] * matsubara to chase mwhudson/jml about failure on updatebranches script ACTION received: * matsubara to chase mwhudson/jml about failure on updatebranches script matsubara, jml is here right now. :) jml, I'll get you an url for the scripts after the meeting. I need to trawl my emails to find it matsubara, ok. thanks. stub, how's 354593 fix coming along? why is this High again? I wonder if mars had time to look over OOPS-1307J16 https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 flacoste, do you know ^? hmm, i put it as such any reason it should be? flacoste, according to the bug history you made it high :-) debranding of the SSO is a U1/ISD affair anyway matsubara: Slow. I need to discuss with people how to actually do it - maybe next week on the sprint if I get time. * sinzui agrees with flacoste stub: i think we should try to get stu and James to do it :-) especially, stu, it would be a test good case for transfer knowledge Anything that means I don't have to work out how ZPT macros works is fine by me. +1 Ursinha, what's up with "Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606"? Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 matsubara, the ExpatErrors were being discussed by mars and gary matsubara: that;s now registry. it actualy is a legitimate oops [action] stub to delegate bug 354593 to ISD Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 ACTION received: stub to delegate bug 354593 to ISD it indicates a problem with mailman integration I will ask barry to look into bug 403606 Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 stub, You recently fixed a DisconnectionError bug. was it related to the errors you discussed with mars? that action item is now done? thanks sinzui and gary_poster :-) matsubara: I landed code to log OOPS reports on DisconnectionError before retrying the request. Is that what you mean? stub, I mean: "* mars and stub to discuss the Disconnection and OperationalErrors after the meeting" stub, is that what caused the TransactionRollbackError oopses? We discussed. I don't recall much about the conversation though :) :-) Ursinha: That fix was, yes. I've got another branch that turns the volume down so we don't log the TransactionCommitError's [action] sinzui to ask barry to fix bug 403606 Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 ACTION received: sinzui to ask barry to fix bug 403606 stub, good, I filed bug 409907 for that Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907 Ursinha, is there a bug for OOPS-1307J16? https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 matsubara, not that I opened one, because we needed to know what was going on over there to open the bug so mars was going to investigate that I don't recall having those anymore [action] ursinha to chase mars about OOPS-1307J16 and file a bug about it https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 ACTION received: ursinha to chase mars about OOPS-1307J16 and file a bug about it https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 I think that's all for last meeting's action items thanks everyone [TOPIC] * Oops report & Critical Bugs & Broken scripts New Topic: * Oops report & Critical Bugs & Broken scripts there are two issues to discuss one was about bug 409907, that I already mentioned to stub and it's being handled Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907 the other is about the select replication_lag() timeouts we're having mthaddon also reported problems that we don't know if are related to that I don't know if there's much to be discussed at this point, because it seems we need to fix oops reports first to be able to see the real problem here is that correct stub: ? * noodles775 has quit (Read error: 54 (Connection reset by peer)) should we request a CP for the branch that fixes the oops log? given that we're skipping a release, that's probably a good idea flacoste, I've spoken with jtv yesterday about those,and he also said that was unlikely to be his changes fault (possible but unlikely) I landed code today that should tell us more about if the timeout is actually occuring due to blocking on the database, or elsewhere. s/his/translations/ stub, should we request a CP? yeah, i really think a CP is a good idea (please please) [action] stub to request CP for his branch that fixes oops logging ACTION received: stub to request CP for his branch that fixes oops logging cool we have two critical bugs, already fix committed so, good cool about the failing scripts we had some scripts failing this week nightly, productreleasefinder and garbo-hourly and rosetta-poimport too nightly was already addressed by jtv matsubara, productreleasefinder isn't expected to fail anymore? sinzui? as a rosetta script was taking too much time and jtv will remove it from nightly and add a cronjob for it Ursinha: no, but the errors is see are not failures...the script was not run stub, do you know why garbo-hourly is failing? Its failing? matsubara: many scripts are not running because of one log process henninge, rosetta-poimport failed on the 5th. can you investigate and reply to the list? s/log/long/ matsubara, it's not being run, it seems * noodles775 (n=miken@canonical/launchpad/noodles775) has joined #launchpad-meeting matsubara: sure, I will. stub, I got a few emails: "Scripts failed to run: loganberry:garbo-hourly" Ursinha: matsubara there is some traffic about this. spm reported the long running prcess a weeks ago. I has asked why the prf had not run and no replies to the list, so I'm asking here thanks henninge matsubara, actually stub repklied *replied Oh - there were some blocked runs because the rosetta export-to-branch script was running in a 5 hour long transaction So the script blocks because it doesn't want to make anything worse. [action] henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list ACTION received: henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list * salgado is now known as salgado-lunch so I guess it's ok that's all for this section from me thanks everyone ! all right. thanks everyone you can move on matsubara [TOPIC] * Operations report (mthaddon/herb/spm) New Topic: * Operations report (mthaddon/herb/spm) 2009-07-31 - Rolled out r8323 to bzrsyncd 2009-08-05 - Cherry picks for code imports, lpnet* and the script server. Our monitoring system has been timing out in connecting to the app servers more often this week. Admittedly its timeout is set lower than the OOPS timeout. But we've also been noticing higher load on the app servers as well. This was discussed by Ursinha during the oops/critical bugs/broken scripts section. There's currently 1 cherry pick and 1 database query awaiting (dis)approval. The LOSAs currently have 14 bugs marked high and triaged. Only 1 of which is assigned to someone and targeted for a release. We would be grateful if we saw some movement on these. We're currently running with a single slave in preparation for the sprint next week. also wanted to check that there should be a cherry pick request for the cowboyed storm change to lpnet9 and lpnet10 (per the production status wiki page) cowboyed storm change? flacoste: https://pastebin.canonical.com/20503/ under eggs/storm-0.14salgado_storm_launchpad_288_308-py2.4-linux-i686.egg mthaddon, herb: i'll look at the LPS to approve/decline right herb, do you keep that list of 14 bugs somewhere? in a wiki page or have a tag to group them? mthaddon: the cherry pick would simply be to update that dependency matsubara: bugs.launchpad.net/~canonical-losas flacoste: well in any case, the CP that was requested (and performed) yesterday overwrote it, so it needs to be formalised so other CPs don't overwrite it again sinzui: can salgado makes an appropriate CP request? Yes it's simply a new upload to download-cache with a versions.cfg change sinzui, flacoste, intellectronica, rockstar: Could you take a look at herb's bug list (bugs.launchpad.net/~canonical-losas) and see what your teams can do about the high ones in the short term? ok clearly we're not looking for all of them to be fixed by the next meeting (though that would be great ;) just mostly would like to know they're staying on the right radars and are being worked on as appropriate. cool anything else for herb? herb: so, basically, these are mostly bugs which will make life easier for you when fixed? bug 348722 should become invalid when we update all pmt teams to become true private teams Launchpad bug 348722 in launchpad-code "Set default branch visibility to "forbidden" if any team set to 'Private'" [High,Triaged] https://launchpad.net/bugs/348722 intellectronica: some of them are geniune operational issues, some of them are quality of life issues for the LOSAs There should be no private-membership teams at the start of week 1 cool, sure, we'll take a look and see if there's any low hanging fruit barry will be working with the losas on August 11 to fix bug 325962 Launchpad bug 325962 in launchpad-registry "lp-mailman startup is blocking on a pid file in the wrong directory" [High,Triaged] https://launchpad.net/bugs/325962 sinzui: that was the one that was assgned and targetted at a release. herb, many times assigned even heh all right. I think that's it herb it failed my rules that bug is not high if it is not worked on by all parties in 3 months thanks thanks herb and everyone [TOPIC] * DBA report (stub) New Topic: * DBA report (stub) We set off some alerts when the poimport script and PostgreSQL decided that lots of disk space should be used. We see some smaller spikes, which is just PG using disk to store intermediary results, but this time it was large enough to set of the alarms. We have seen this once before, and in neither case have we been able to repeat it. My best hypothesis is the planner statistics triggering a really bad query plan, so I'll bump the planner statistic sample size on the production dbs in case this stops future occurances. henninge, maybe the last rosetta-poimport failure was related to that ^ matsubara: I believe we already know what it was about and it may be related to that. matsubara: I'll talk to the guys. henninge, cool. thanks stub, anything else? Not that I can think of all right. thank you stub I guess that's all for today Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs. #endmeeting Meeting finished at 10:44.