#format IRC #startmeeting Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues. [TOPIC] Roll Call Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us! Meeting started at 10:00. The chair is Ursinha. Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE] New Topic: Roll Call me me ! me ni! me lol I demand a SHRUBBERY! me matsubara, hi (the rest of the Translations team is out on vacation. Presumably I'll get a T-shirt) lol bigjools, hi me, still on TL call okay, nothing to soyuz today from oops land stub, hi me me matsubara, welcome back thanks Ursinha [TOPIC] Agenda * Actions from last meeting * Oops report & Critical Bugs & Broken scripts * Operations report (mthaddon/herb/spm) * DBA report (stub) New Topic: Agenda [TOPIC] * Actions from last meeting * matsubara to chase rockstar about failure on updatebranches script * stub to get RC for branch that fixes bug 403283 * landed in r8319 * ursinha do file bug for OOPS-1300XMLP5 * filed https://bugs.edge.launchpad.net/launchpad-foundations/+bug/403606 * Ursinha to keep one eye on UnicodeDecodeErrors, and will report back next meeting * we're having less errors, Salgado will try to fix bug 61171 * mars to take a look at bug 354593 * matsubara to chase salgado about people pruning script New Topic: * Actions from last meeting Bug 403283 on http://launchpad.net/bugs/403283 is private https://lp-oops.canonical.com/oops.py/?oopsid=1300XMLP5 Launchpad bug 403606 in launchpad-foundations "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] Bug 61171 on http://launchpad.net/bugs/61171 is private Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 Ursinha: We're having "fewer" errors :-P The person pruner is running on production. It will take 3 or 4 weeks to complete. Ursinha, I haven't had time to do that this week as I was on vacation thanks jtv :P matsubara, okay please, re-add to the list and I'll chase [action] matsubara to chase rockstar about failure on updatebranches script stub, wow ACTION received: matsubara to chase rockstar about failure on updatebranches script Ursinha, the other one about the pruning script too [action] matsubara to chase salgado about people pruning script ACTION received: matsubara to chase salgado about people pruning script Chase what? it is running. but what stub said? yes, yes Yes, what stub said. matsubara, ^ stub, don't we need to pass --experimental to it? It is - I filed an rt and chased it through with spm. And monitoring too ok, so that's sorted then :-) [action] remove last matsubara item as stub already reported it's ok] ACTION received: remove last matsubara item as stub already reported it's ok] :-) thanks Ursinha matsubara, np :) mars, did you have the time to look at bug 354593? Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 * Ursinha wonders if mars is on the TL call as well nope Why is there a TL call this early? *whew* I did not have a chance to address it oops section is all foundations today I doubt my TL is on that TL call... he's not well, mars, can you do that then, please? :) stub, would you be able to take the SSO bug? Or find someone who has time to address it? The branding one? stub, yes sir yes stub, bug 354593 Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 I can try. I haven't done UI stuff for a long time though so it will be slow. mars, ^ Ursinha, ? stub, I can give you a hand with it okay then [action] stub to give a try on bug 354593 with mars help if needed Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 ACTION received: stub to give a try on bug 354593 with mars help if needed moving on [TOPIC] * Oops report & Critical Bugs & Broken scripts mars, I have two bugs and one oops for foundations mars, can you triage bug 403606, and take a look at it, if possible? mars, also, OOPS-1307J16 shows an AssertionError without any description and finally mars, do you know if someone could have some time to fix bug 310818? we had some weird timeouts today and the oops report was borked New Topic: * Oops report & Critical Bugs & Broken scripts Launchpad bug 403606 in launchpad-foundations "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 Launchpad bug 310818 in launchpad-foundations "Oops report does not always log timed-out query" [High,Triaged] https://launchpad.net/bugs/310818 Ursinha, I'll have to ask the team mars, about what exactly? :) btw, thanks for helping with the Unicode issues debugging last week, was very useful Ursinha, to see who on foundations has time to address the issue - I would guess gary or stub, but I don't know how heavily committed they are at the moment mars, the bug 310818? Launchpad bug 310818 in launchpad-foundations "Oops report does not always log timed-out query" [High,Triaged] https://launchpad.net/bugs/310818 I was going to look at it but someone designated me victim for the SSO exception stuff ;) haha Heh, It feels like I have a backlog to kingdom come, and I keep getting more. :-) But if this is urgent, then it's urgent I can probably do both - what should be done first Ursinha? stub, the oops one, I'd suggest * mrevell (n=matthew@canonical/launchpad/mrevell) has joined #launchpad-meeting as I said earlier we had some weird timeouts and it would be nice to be able to debug them if they happen again [action] stub to fix bug 310818 Ursinha, sorry, I was reading the Expat error one. That sounded like something gary would be able to address, since it starts in the heart of Zope, and has to do with exception bubbling through the architecture Launchpad bug 310818 in launchpad-foundations "Oops report does not always log timed-out query" [High,Triaged] https://launchpad.net/bugs/310818 ACTION received: stub to fix bug 310818 mars, right. gary_poster, want to fix that? :) gary_poster, ^ does that make sense? looking Ursinha, I'll look at OOPS-1307J16 https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 [action] mars to take a look at OOPS-1307J16 https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 ACTION received: mars to take a look at OOPS-1307J16 thanks mars https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 mars, I'd open a bug but had no clue about what happened Ursinha, not precisely clear on the goal but we can talk later. I'm guessing this is low priority. I'll take it. (bug 403606) Launchpad bug 403606 in launchpad-foundations "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 thanks gary_poster gary_poster, the point is that it fills the oops reports with those it would be great to get rid of them and they just mean that somebody is sending us bad XMLRPC, afaict, and we don't want to care? gary_poster, it would be nice if the ExpatError was, say, turned into a 400 HTTP status code right creating a section on the summary or moving them to dev/null was told to be not a good idea (I agree) ack, ok. gary_poster, do you think it's to painful to fix? *too [action] Ursinha to learn to type ACTION received: Ursinha to learn to type Ursinha: It will involve changing zope publication machinery. Doing so will mean either hacking our zope tree, which we are really trying not to do; or migrating to a newer version of the publication machinery, which should wait on the Zope-buildbot work that I keep not finishing, and then will be a migration exercise, possibly accompanied with a negotiate-with-upstream exercise. lol so... the only quick and easy fix is the hack I see..... that I would be hoping to eliminate RSN when I move all the zope stuff to eggs gary_poster, so, a very temporary hack? so I'm happy to have that be a bug, but I'd like it to be a back-burner bug, myself right I certainly understand the pain though :-( Ursinha, sound good? We can review the solution outside the meeting or have a hint of it at least mars, sure thanks gary_poster [action] Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606 Launchpad bug 403606 in launchpad-foundations "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 ACTION received: Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606 thanks mars and gary_poster well thanks mars and Ursinha :-) :) we're not having more InterfaceErrors (thanks salgado), but we're still having OperationalErrors (OOPS-1306J75, OOPS-1306J96) and DisconnectionErrors (OOPS-1306I440, OOPS-1306J343) https://lp-oops.canonical.com/oops.py/?oopsid=1306J75 https://lp-oops.canonical.com/oops.py/?oopsid=1306J96 https://lp-oops.canonical.com/oops.py/?oopsid=1306I440 https://lp-oops.canonical.com/oops.py/?oopsid=1306J343 what can we do about it? I think that's yours too mars * noodles775_ (n=miken@g225068254.adsl.alicedsl.de) has joined #launchpad-meeting Ursinha, I'll need to look at them (obviously :) :) stub, do you have any clues? stub, that looks like something for you - Storm barfing while talking over the internal network? gary_poster: You could probably switch of the oops in errorlog.py - we already skip certain well defined exceptions entirely. stub: oh, ok. not familiar with that. sounds good on the face of it. stub, gary_poster, well, that's why I suggested wrapping the ExpatError in a 400 status - it's the nice HTTP thing to do, since the client is in the wrong, not us. Ursinha: All 'due to administrator command' are when the server killed the connection, usually because it was sitting idle too long. mars, can't do that without getting into the publication machinery. stub's approach is not as elegant as what you suggest, but much more doable in the short term. stub, so too much lag between opening the app server request and actually getting to issue commands? hm, unless there's an indirection lurking around...checking... gary_poster, ok mars: For an appserver request, we kill anything that doesn't complete in 2 minutes. For scripts, we kill anything that is idle-in-transaction for 90 minutes (or something like that). mars: So a massive slowdown or pause anytime after the db transaction starts stub, ok, would you be able to do an analysis, or help me do one, after the meeting? to find what is taking so long to execute I already looked at this one - I have no idea why that query would stop (the OOPS i'm looking at Ursinha pointed me at earlier) you know what the problem is, but I would have to troll the OOPSes to find the root cause stub, did I? A number of requests timed out all at exactly the same time. I doubt we can reproduce it, and there doesn't seem enough information to diagnose it. stub, that was another oops, I guess Yup. But the one I'm looking at is select replication_lag() again - this time it took so long it got terminated by the reaper. stub, oh, I see so they are related stub, because of having not enough data in that oops you can't diagnose the Errors? *Errors mars, stub, we're running out of time, can we discuss that on -dev after the meeting? Ursinha, sure allright [action] mars and stub to discuss the Disconnection and OperationalErrors after the meeting * noodles775 has quit (Read error: 110 (Connection timed out)) ACTION received: mars and stub to discuss the Disconnection and OperationalErrors after the meeting critical bugs: we have bug 403283, that is in progress as commented on it Bug 403283 on http://launchpad.net/bugs/403283 is private * noodles775_ is now known as noodles775 moving on! thanks a lot guys [TOPIC] * Operations report (mthaddon/herb/spm) New Topic: * Operations report (mthaddon/herb/spm) 2009-07-28 - Rolled critical fixes to the app servers and scripts server. We've had some issues with codebrowse in the last week where the process dies but doesn't leave a core file. It's also not leaving anything interesting in the logs. We haven't filed a bug yet, but expect one soon. Any help in determining the best way to debug the issue would be helpful. mthaddon has a 2nd librarian instance up and running. We're preparing to load balance between them. mthaddon updated bug #403283 with some questions and it appears stub has responded to them. Bug 403283 on http://launchpad.net/bugs/403283 is private There aren't any pending queries or cherry pick requests. That's it for the LOSAs unless there are questions. anything for herb? thanks herb! moving on [TOPIC] * DBA report (stub) New Topic: * DBA report (stub) thanks Ursinha I generated a new database baseline from the production database and landed it. We do this occasionally to ensure that the version of the db we are developing on matches what is actually running on production (patches made to the live system not backported to the trunk or stuffups can cause drift). If you have an approved but unlanded database patch you will need a new database patch number from me. stub, oot? :) oot sweet anyone wants to say something? thanks stub :) I have something to say to all contacts we want to close the 2.2.7 milestone, so, if you have pending bugs or blueprints, please, retarget of fix release them intellectronica, I saw that bugs team has a lot (really) of not assigned bugs targeted to 2.2.7 Ursinha: sure, will make sure we sort them out now * jtv has quit (Read error: 60 (Operation timed out)) thanks a lot intellectronica and all guys you rock Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs. #endmeeting Meeting finished at 10:47.