Diff for "BugTriage"

Not logged in - Log In / Register

Differences between revisions 1 and 31 (spanning 30 versions)
Revision 1 as of 2009-01-05 19:17:00
Size: 9127
Editor: kfogel
Comment: Initial draft.
Revision 31 as of 2011-09-14 20:42:33
Size: 9682
Editor: lifeless
Comment: and note that things preventing detection of other failures are themselves critical
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= A Practical Guide to Launchpad Bug Triage = = Triaging Launchpad project bugs =
Line 3: Line 3:
''Or: "Why I don't classify bugs as medium"'' ||<tablestyle="float:right; font-size: 0.9em; width:40%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>||
Line 5: Line 5:
(Based on an email from Curtis Hovey.) Our triage process is basically this: make sure that ''Critical'' and ''High'' bugs are correctly marked.
Line 7: Line 7:
The process of triaging issues (bugs, features, and tasks) has one crucial principle: Prioritise the work according to ''need'' and ''certainty''. We want:
Line 9: Line 9:
Work is prioritised because there are not enough engineers to do all the work. Some featured will never be completed, some bugs will never be fixed. Triage determines which bugs can and will be fixed, which features can and will be implemented. ''Need'' is generally understood, when planning work, but ''certainty'' is not, and that often leads to wasted work and unmet expectations.  * ''Critical'' bugs to be those that need attention before all others. Right now: OOPSes, timeouts, regressions, stakeholder-escalated bugs, operational issues (e.g. build breakage, code issues causing deployment failures, things preventing us detecting other failures (e.g. cronspam), things that should oops but a lack of tooling prevents) and bugs which are dependencies of other critical bugs.
 * The ''High'' bugs list to be around six months deep. Many parts of Canonical are on a six month cycle and fitting in with that is convenient. Some specific sorts of bugs we always treat as high. Right now: A and AA [[PolicyAndProcess/Accessibility|treat accessibility bugs]].
Line 11: Line 12:
By ''need'', I mean a measure of severity. What percentage of users does the issue affect, and how severely does it impede them from completing their task. We use a [[#quarterly|quarterly review]] to shrink the ''High'' list if it looks like more than six months of work.
Line 13: Line 14:
By ''certainty'', I mean a measure of how certain the engineers are that they can address the issue. Time is also a factor in this measure, the longer an issue takes to address, the more likely that the conditions that were first judged will change. For a full understanding of why we triage bugs and how we came to develop this process, please read our description of the [[/Background|background to our bug triage process]].
Line 15: Line 16:
The act of triage is separating work into groups that are being worked on now, next and last. There can only be as many "now" bugs or features as there are engineers. The number of "next" work is limited to the velocity of the engineers and how infrequently plans change. The bugs that are last will probably never be addressed, the last features may never be started. == How to triage ==
Line 17: Line 18:
The corollary to this rule is that there are a finite number of bugs or features in the first two groups. There cannot be more work in these groups than there are engineers to do for the given period of time; otherwise the engineers, businesses and users are being misinformed about when issues will be addressed. These are the questions we ask when triaging bug reports about Launchpad-project:
Line 19: Line 20:
== An Example ==  1. '''Is this a bug in Launchpad-project?''' If not, move it to the appropriate project (e.g. Ubuntu) and move to the next bug. Note that bugs in lazr.restful, loggerhead etc '''are''' bugs in Launchpad-project.
 1. '''Is this bug on the right subproject?''' If not, move it to the right sub project.
 1. '''Is it a duplicate?''' if there is a duplicate, mark the newer bugs as a duplicate of the older bug ([[#duplicates|read more about duplicates]]).
 1. '''Is it something we will not do and would not accept a patch to do?''' If so, mark it as ''Won't Fix''.
 1. '''Is it an operational request?''' If yes, covert it to a question.
 1. '''When are we likely to fix this?''' Set the importance to show when we'll get to fixing this bug ([[#importance|read more about choosing an importance]]).
 1. '''Does the report have enough detail?''' If we couldn't replicate or otherwise begin work on the bug with the information provided, request further information from the reporter and mark it as ''Incomplete'' and move to the next bug. If someone has already asked for more info and the reporter has replied, change the status from ''Incomplete'' to ''Triaged''.
 1. '''set the status to ''Triaged'''''.
 
As you might expect, we give a triaged bug the ''Triaged'' status.
Line 21: Line 31:
Consider there is one engineer and two bugs. He can only work one bug at a time. One bug is more important than the other. The risk is that he may not be able to fix one of the bugs before users are disappointed and abandon the application. He risks disappointing all users if he does not fix either bug because he choose the one with the most need over the one he was certain he could address. If you're uncertain what importance to give a bug, chat with another engineer. If there's a disagreement, let common sense and courtesy take priority.
Line 23: Line 33:
If he does not know how to fix the bug with the most need, or that the fix takes a long time, he is wasting time he could have spent fixing the bug with more certainty. The only way he can address the bug with the most need is to employ a hack to reduce the need, to meet the expectations of some users. The hack is also used to gain time to understand the problem, thus increase certainty. Need help? [[Help|Talk to someone]].

== Quick links ==

||<tablestyle="width: 60%;" style="background: #2a2929; font-weight: bold; color: #f6bc05;">All of Launchpad||[[https://bugs.launchpad.net/launchpad-project|All]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?search=Search&field.status=New|New]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?field.status:list=NEW&field.status:list=INCOMPLETE_WITH_RESPONSE&field.status:list=INCOMPLETE_WITHOUT_RESPONSE&field.status:list=CONFIRMED|Untriaged bugs with no importance]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?field.status:list=NEW&field.status:list=INCOMPLETE_WITH_RESPONSE&field.status:list=INCOMPLETE_WITHOUT_RESPONSE&field.status:list=CONFIRMED|Untriaged bugs that have a status]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?field.searchtext=&orderby=-importance&search=Search&field.status%3Alist=TRIAGED&assignee_option=any&field.assignee=&field.bug_reporter=&field.bug_supervisor=&field.bug_commenter=&field.subscriber=&field.tag=&field.tags_combinator=ANY&field.has_cve.used=&field.omit_dupes.used=&field.omit_dupes=on&field.affects_me.used=&field.has_patch.used=&field.has_branches.used=&field.has_branches=on&field.has_no_branches.used=&field.has_no_branches=on&field.has_blueprints.used=&field.has_blueprints=on&field.has_no_blueprints.used=&field.has_no_blueprints=on|Triaged]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?search=Search&field.importance=Critical&field.status=New&field.status=Incomplete&field.status=Confirmed&field.status=Triaged&field.status=In+Progress&field.status=Fix+Committed|Critical]]||

<<Anchor(importance)>>
= Importance =

We use three of Launchpad's bug importances and give each a specific meaning.

||<tablestyle="width: 60%;" rowstyle="background: #2a2929; font-weight: bold; color: #f6bc05;">~+Importance+~||~+Meaning+~||
||<style="font-weight: bold; color: #e01010;"> ~+{{attachment:bug-critical.png}} Critical+~||Bugs that need to jump the queue. When all is well, we should have no Critical bugs.||
||<style="color: #f96413; font-weight: bold;">~+{{attachment:bug-high.png}} High+~||Bugs that are likely to get attention in the next six months.||
||<style="color: #d1d03c; font-weight: bold;">~+{{attachment:bug-low.png}} Low+~||All other bugs.||

The importance of a particular bug report reflects the priorities of the Launchpad project. Individuals working on Launchpad may have different priorities. ([[#selecting|Read more about selecting bugs to work on]])

<<Anchor(critical)>>
== Critical ==

Any bug marked ''Critical'' takes priority over all other bugs.

At present, timeouts, OOPSes (thanks to our [[https://dev.launchpad.net/PolicyAndProcess/ZeroOOPSPolicy|zero OOPS policy]]), security bugs, regressions (including supported-browser issues) and stakeholder escalations are all marked as ''Critical''. Non-security bugs should also be tagged "oops", "regression", etc. so that the reason for their importance is clear. Other types of bug may also be ''Critical''; Francis or Robert will expect you to justify marking any other type of bug as ''Critical''.

If all is well with Launchpad, there should be no ''Critical'' bugs.
Line 26: Line 61:
== Only Assign Work that You Are Commiting to do in the Near Future == <<Anchor(high)>>
== High ==
Line 28: Line 64:
When a work is assigned to an engineer, he is commiting to complete the work in the near future. What the "near future" means is different for each project. I suggest 3 releases is the "near future", because when work is planned, the engineer is thinking about now, next, and last. For some projects this period might be 6 weeks, for others, 6 months. These are bugs that we believe we will work on in the next six months and A and AA conformance [[PolicyAndProcess/Accessibility|accessibility bugs]].
Line 30: Line 66:
I prefer to plan for the current release, and the next one. As work is reprioritised, it may be rescheduled to the third release. I do not think it is wise to plan a bug or feature to be completed in the third releases because if it slips to the fourth or fifth released, I doubt the it was correctly prioritized as high. <<Anchor(low)>>
== Low ==
Line 32: Line 69:
Any high work that is assigned to a engineer for more than 3 releases was not high. If it were, the work would have been reassigned to someone who could complete it in the scheduled time. Any other work that is assigned for more than 1 release is also misprioritised. You are lying to yourself, and the the project's users, when you assign work that you are not committing to fixing. We mark as ''Low'' any bug that we recognise as legitimate but that is '''not''' scheduled for Canonical staff to fix in the next 6 months. This is not the same as planning not to fix the bug; it means that we don't know when we will fix it, if at all. This includes AAA conformance [[PolicyAndProcess/Accessibility|accessibility bugs]].
Line 34: Line 71:
== Practical Classifications of Importance == == Others ==
Line 36: Line 73:
Work is often classified in relative terms. It is better to classify work according to how it are managed to convey when and under what terms the bug will be fixed or a feature will be complete. There are three priorities that work can be classified as: We do not use ''Medium'' or ''Wishlist''. This is primarily to avoid giving false hope to people who are interested in a bug that is neither ''Critical'' nor ''High'': if it does not have one of these statuses, we think it is unlikely we will fix it in the next six months.
Line 38: Line 75:
Critical:: The bug dramatically impairs users. Users may lose their data. Users cannot complete crucial tasks. The feature is needed to encourage adoption or prevent abandonment of the project. = Tagging bugs =
Line 40: Line 77:
    Synonyms: required, now, must do We tag bugs as part of the triage process. Read the [[https://dev.launchpad.net/LaunchpadBugTags|list of Launchpad tags]] to find out which tags to use.
Line 42: Line 79:
    The work is immediately assigned to a engineer. It is his top priority to fix. Team members help the engineer to plan and do the work. The work is released as soon as it is deployable; in the case of a bug, it is released outside of the release schedule. = Assigning bugs =
Line 44: Line 81:
High:: The bug prevents users from completing their tasks. The feature provides new kinds of tasks or new ways of completing tasks. We do not assign bugs as part of the triage process. Only ''In progress'' bugs should be assigned to someone.
Line 46: Line 83:
    Synonyms: expected, next, can do, should do Even ''Critical'' bugs do not need an assignee, unless they are being worked on. Being at the top of the queue is all we need for ''Critical'' bugs to get the attention they require.
Line 48: Line 85:
    The work is assigned to a engineer to be completed in the next 3 releases. The engineer may choose to do other work if he believes it is within the scope of the high priority work. <<Anchor(selecting)>>
= Selecting bugs to work on =
Line 50: Line 88:
Low:: The bug is an inconvenience to users, but it does not prevent them from completing their tasks. The feature is a convenience to users. If you are working on Launchpad in your own time you'll most likely want to fix those bugs that matter to you, regardless of what importance the Launchpad project gives them. That's great and we welcome all bug fixes; we encourage you to look at [[FixBugs|our page about fixing bugs]] first.
Line 52: Line 90:
    Synonyms: optional, last, may do Members of Canonical's Launchpad team will select bugs depending on whether they're in a maintenance or feature squad.
Line 54: Line 92:
    The engineer may assign the work to himself while working on a high priority work because the high work provides an opportunity to complete the low priority work at less cost. If the low work in any way jeopardises the high priority work, the low work is unassigned. The engineer is thus ''certain'' that the work can be fixed quickly and without difficulty. A corollary to this rule is that low work that is assigned to a engineer must be "in progress" or "fixed" states. Generally speaking, squads on feature-rotation will consider the importance of a bug only after filtering for work that applies directly to their current feature.
Line 56: Line 94:
Maintenance squads, however, will usually be working from the bug database: picking bugs based on their triaged importance. They should look at each importance in order &mdash; critical, high, low &mdash; and from within that bucket take one of the oldest bugs. Crucially though, there should be no ''Critical'' bugs before they start work on ''High'' or ''Low'' bugs. Engineers should prefer ''High'' bugs over ''Low'' bugs, but may use their discretion.
Line 57: Line 96:
== The Problem with "Medium" == <<Anchor(quarterly)>>
= Quarterly review =
Line 59: Line 99:
It might be argued that when the engineer has an opportunity to fix a low or a medium bug, he must choose the medium one. This rules does not define a practical distinction between medium and low. There is no commitment to fix the medium bug; it will not be scheduled for fixing. A engineer chooses to undertake a low bug because he sees an opportunity to fix it while working in the affected code. The engineer is choosing to do unscheduled work because he is ''certain'' it does not jeopardise his scheduled work. The engineer might see an opportunity to fix a medium and a low bug at the same time, but that is unlikely. Four times a year, we put all of the ''High'' bugs back through the triage process. This lets us make sure that all those bugs really should be ''High'' and to take account of anything that has changed since they were last triaged.
Line 61: Line 101:
It can also be argued that 'critical' is 'high' and that 'high' is 'medium'. True, that is a matter of semantics. The crux of the issue is that there are three practical classifications of work. The words chosen to describe the classifications could use the tofu scale of hard, firm, and soft. People who are unfamiliar with triage will appreciate names that convey the kind of attention the issue will receive. = Resolving disputes =
Line 63: Line 103:
Beyond these rules a bug is more important than another bug if fixing it will make Launchpad more better than fixing the other bug.
Line 64: Line 105:
== Consequences of Misprioritised Work == Discretion and a feel for whats in the bug database will help a lot here, as will awareness of our userbase and their needs. One sensible heuristic is to look at five to ten existing ''High'' bugs and, if the new bug is less important than all of them, mark it ''Low'' as it's probably less important than all existing ''High'' bugs.
Line 66: Line 107:
Stakeholders often use reports that list the prioritised work for a release and for each engineer. When work is misclassified there are two commonly observed consequences: a decreased in certainty, and a decrease in communication. Engineers have discretion to decide any particular bug should be sorted higher (or lower) than it has been; some change requests are very important to many of our users while still not big enough to need a dedicated feature-squad working on them.
Line 68: Line 109:
In the first consequence, the engineer's effort may be wasted; there are issues that have more ''need and certainty''. Engineers, and other stakeholders, are often tempted to complete the misdirected work after the misclassification is discovered because it is assumed that it is better to always deliver something finished than nothing at all. This is a risky choice, because it jeopardises work in future releases. By working on less important work, the engineer is decreasing the certainty of the more important work.

The second consequence is that the engineer ignores the list and he works on issues according to some other source, such as the opinion of another stakeholder. While the engineer is working on the correct issue, it is unclear to other parties what work is going on and when will it be completed. Users may abandon the project in frustration. Planners cannot coordinate all the stakeholders.

The first consequence is possibly a failure to do re-prioritisation during the triage process, but second consequence is a total failure in the triage process. Why would anyone do triage if the prioritisation will be ignored? How can work be coordinated if the work is unknown to all stakeholders? Why would users trust a project if it does not do what it says it will do?

Work must be reprioritised during the triage process to ensure that engineers are working on the issues with the most need and certainty. Engineers must work from the list or prioritised issues.

== Indicators of Misprioritised Work ==

The rules of practical classification provide tests for misprioritised bugs, features, or tasks.

 * The work is critical, but it is not assigned and targeted for release.
 * The work prioritised as high, but it is not assigned and for a release.
 * The work is high, but have not been worked on in 3 releases.
 * The work is low and unassigned, yet it is targeted for a release.
 * The work is low and assigned, but the engineer is not working on it.
 * The work is considered to be triaged, but it's priority is not critical, high, or low.
 * An engineer is assigned more work than he can accomplish in 3 releases, and it cannot be reassigned.
When two engineers disagree, or if someone in the management chain disagrees, common sense and courtesy should be used in resolving the disagreement.

Triaging Launchpad project bugs

Our triage process is basically this: make sure that Critical and High bugs are correctly marked.

We want:

  • Critical bugs to be those that need attention before all others. Right now: OOPSes, timeouts, regressions, stakeholder-escalated bugs, operational issues (e.g. build breakage, code issues causing deployment failures, things preventing us detecting other failures (e.g. cronspam), things that should oops but a lack of tooling prevents) and bugs which are dependencies of other critical bugs.

  • The High bugs list to be around six months deep. Many parts of Canonical are on a six month cycle and fitting in with that is convenient. Some specific sorts of bugs we always treat as high. Right now: A and AA treat accessibility bugs.

We use a quarterly review to shrink the High list if it looks like more than six months of work.

For a full understanding of why we triage bugs and how we came to develop this process, please read our description of the background to our bug triage process.

How to triage

These are the questions we ask when triaging bug reports about Launchpad-project:

  1. Is this a bug in Launchpad-project? If not, move it to the appropriate project (e.g. Ubuntu) and move to the next bug. Note that bugs in lazr.restful, loggerhead etc are bugs in Launchpad-project.

  2. Is this bug on the right subproject? If not, move it to the right sub project.

  3. Is it a duplicate? if there is a duplicate, mark the newer bugs as a duplicate of the older bug (read more about duplicates).

  4. Is it something we will not do and would not accept a patch to do? If so, mark it as Won't Fix.

  5. Is it an operational request? If yes, covert it to a question.

  6. When are we likely to fix this? Set the importance to show when we'll get to fixing this bug (read more about choosing an importance).

  7. Does the report have enough detail? If we couldn't replicate or otherwise begin work on the bug with the information provided, request further information from the reporter and mark it as Incomplete and move to the next bug. If someone has already asked for more info and the reporter has replied, change the status from Incomplete to Triaged.

  8. set the status to Triaged.

As you might expect, we give a triaged bug the Triaged status.

If you're uncertain what importance to give a bug, chat with another engineer. If there's a disagreement, let common sense and courtesy take priority.

Need help? Talk to someone.

All of Launchpad

All

New

Untriaged bugs with no importance

Untriaged bugs that have a status

Triaged

Critical

Importance

We use three of Launchpad's bug importances and give each a specific meaning.

Importance

Meaning

bug-critical.png Critical

Bugs that need to jump the queue. When all is well, we should have no Critical bugs.

bug-high.png High

Bugs that are likely to get attention in the next six months.

bug-low.png Low

All other bugs.

The importance of a particular bug report reflects the priorities of the Launchpad project. Individuals working on Launchpad may have different priorities. (Read more about selecting bugs to work on)

Critical

Any bug marked Critical takes priority over all other bugs.

At present, timeouts, OOPSes (thanks to our zero OOPS policy), security bugs, regressions (including supported-browser issues) and stakeholder escalations are all marked as Critical. Non-security bugs should also be tagged "oops", "regression", etc. so that the reason for their importance is clear. Other types of bug may also be Critical; Francis or Robert will expect you to justify marking any other type of bug as Critical.

If all is well with Launchpad, there should be no Critical bugs.

High

These are bugs that we believe we will work on in the next six months and A and AA conformance accessibility bugs.

Low

We mark as Low any bug that we recognise as legitimate but that is not scheduled for Canonical staff to fix in the next 6 months. This is not the same as planning not to fix the bug; it means that we don't know when we will fix it, if at all. This includes AAA conformance accessibility bugs.

Others

We do not use Medium or Wishlist. This is primarily to avoid giving false hope to people who are interested in a bug that is neither Critical nor High: if it does not have one of these statuses, we think it is unlikely we will fix it in the next six months.

Tagging bugs

We tag bugs as part of the triage process. Read the list of Launchpad tags to find out which tags to use.

Assigning bugs

We do not assign bugs as part of the triage process. Only In progress bugs should be assigned to someone.

Even Critical bugs do not need an assignee, unless they are being worked on. Being at the top of the queue is all we need for Critical bugs to get the attention they require.

Selecting bugs to work on

If you are working on Launchpad in your own time you'll most likely want to fix those bugs that matter to you, regardless of what importance the Launchpad project gives them. That's great and we welcome all bug fixes; we encourage you to look at our page about fixing bugs first.

Members of Canonical's Launchpad team will select bugs depending on whether they're in a maintenance or feature squad.

Generally speaking, squads on feature-rotation will consider the importance of a bug only after filtering for work that applies directly to their current feature.

Maintenance squads, however, will usually be working from the bug database: picking bugs based on their triaged importance. They should look at each importance in order — critical, high, low — and from within that bucket take one of the oldest bugs. Crucially though, there should be no Critical bugs before they start work on High or Low bugs. Engineers should prefer High bugs over Low bugs, but may use their discretion.

Quarterly review

Four times a year, we put all of the High bugs back through the triage process. This lets us make sure that all those bugs really should be High and to take account of anything that has changed since they were last triaged.

Resolving disputes

Beyond these rules a bug is more important than another bug if fixing it will make Launchpad more better than fixing the other bug.

Discretion and a feel for whats in the bug database will help a lot here, as will awareness of our userbase and their needs. One sensible heuristic is to look at five to ten existing High bugs and, if the new bug is less important than all of them, mark it Low as it's probably less important than all existing High bugs.

Engineers have discretion to decide any particular bug should be sorted higher (or lower) than it has been; some change requests are very important to many of our users while still not big enough to need a dedicated feature-squad working on them.

When two engineers disagree, or if someone in the management chain disagrees, common sense and courtesy should be used in resolving the disagreement.

BugTriage (last edited 2022-04-19 09:36:39 by lgp171188)