Diff for "BugTriage"

Not logged in - Log In / Register

Differences between revisions 1 and 40 (spanning 39 versions)
Revision 1 as of 2009-01-05 19:17:00
Size: 9127
Editor: kfogel
Comment: Initial draft.
Revision 40 as of 2022-04-19 09:36:39
Size: 9412
Editor: lgp171188
Comment: Added more details to clarify moving a triaged bug to the appropriate project or distribution
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= A Practical Guide to Launchpad Bug Triage = = Triaging Launchpad project bugs =
Line 3: Line 3:
''Or: "Why I don't classify bugs as medium"'' ||<tablestyle="float:right; font-size: 0.9em; width:40%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>||
Line 5: Line 5:
(Based on an email from Curtis Hovey.) Our triage process is basically this: make sure that ''Critical'' and ''High'' bugs are correctly marked.
Line 7: Line 7:
The process of triaging issues (bugs, features, and tasks) has one crucial principle: Prioritise the work according to ''need'' and ''certainty''. We want:
Line 9: Line 9:
Work is prioritised because there are not enough engineers to do all the work. Some featured will never be completed, some bugs will never be fixed. Triage determines which bugs can and will be fixed, which features can and will be implemented. ''Need'' is generally understood, when planning work, but ''certainty'' is not, and that often leads to wasted work and unmet expectations.  * ''Critical'' bugs to be those that need attention before all others. Right now: regressions, stakeholder-escalated bugs, operational issues (e.g. build breakage, code issues causing deployment failures, things preventing us detecting other failures such as cronspam, things that should oops but a lack of tooling prevents) and bugs that are dependencies of other critical bugs.
 * The ''High'' bugs list to be our main set of top priorities. Some specific sorts of bugs we always treat as high. Right now: OOPSes, timeouts, A and AA [[PolicyAndProcess/Accessibility|treat accessibility bugs]].
Line 11: Line 12:
By ''need'', I mean a measure of severity. What percentage of users does the issue affect, and how severely does it impede them from completing their task. {{{#!wiki note
We would prefer to be able to treat OOPSes and timeouts as critical (as was the case until 2020), but having a practically-usable Critical queue takes priority.
Line 13: Line 15:
By ''certainty'', I mean a measure of how certain the engineers are that they can address the issue. Time is also a factor in this measure, the longer an issue takes to address, the more likely that the conditions that were first judged will change. We are currently reviewing previously-triaged bugs. Prior to 2020, the Critical and High queues grew significantly, and many bugs that were marked as such due to their urgency are less urgent when assessed today. By significantly pruning these lists we can ensure that we're focussing our time and energy on the most important priorities.
}}}
Line 15: Line 18:
The act of triage is separating work into groups that are being worked on now, next and last. There can only be as many "now" bugs or features as there are engineers. The number of "next" work is limited to the velocity of the engineers and how infrequently plans change. The bugs that are last will probably never be addressed, the last features may never be started. For a full understanding of why we triage bugs and how we came to develop this process, please read our description of the [[/Background|background to our bug triage process]].
Line 17: Line 20:
The corollary to this rule is that there are a finite number of bugs or features in the first two groups. There cannot be more work in these groups than there are engineers to do for the given period of time; otherwise the engineers, businesses and users are being misinformed about when issues will be addressed. == How to triage ==
Line 19: Line 22:
== An Example == These are the questions we ask when triaging bug reports about [[https://bugs.launchpad.net/launchpad-project|launchpad-project]]:
Line 21: Line 24:
Consider there is one engineer and two bugs. He can only work one bug at a time. One bug is more important than the other. The risk is that he may not be able to fix one of the bugs before users are disappointed and abandon the application. He risks disappointing all users if he does not fix either bug because he choose the one with the most need over the one he was certain he could address.  1. '''Is this a bug in Launchpad-project?''' If not, move it to the appropriate project or distribution (e.g. Ubuntu) and move to the next bug. Note that bugs in lazr.restful, loggerhead etc '''are''' bugs in launchpad-project. To move the bug, click the dropdown button in the left side of the ''Affects'' column and then move it to the appropriate project or distribution.
 1. '''Is this bug on the right subproject?''' If not, move it to the right sub project.
 1. '''Is it a duplicate?''' if there is a duplicate, mark the newer bugs as a duplicate of the older bug.
 1. '''Is it something we will not do and would not accept a patch to do?''' If so, mark it as ''Won't Fix''.
 1. '''Is it an operational request?''' If yes, convert it to a question.
 1. '''When are we likely to fix this?''' Set the importance to show when we'll get to fixing this bug ([[#importance|read more about choosing an importance]]).
 1. '''Does the report have enough detail?''' If we couldn't replicate or otherwise begin work on the bug with the information provided, request further information from the reporter and mark it as ''Incomplete'' and move to the next bug. If someone has already asked for more info and the reporter has replied, change the status from ''Incomplete'' to ''Triaged''.
 1. '''Set the status to ''Triaged'''''.
Line 23: Line 33:
If he does not know how to fix the bug with the most need, or that the fix takes a long time, he is wasting time he could have spent fixing the bug with more certainty. The only way he can address the bug with the most need is to employ a hack to reduce the need, to meet the expectations of some users. The hack is also used to gain time to understand the problem, thus increase certainty. If you're uncertain what importance to give a bug, chat with another engineer. If there's a disagreement, let common sense and courtesy take priority.

Need help? [[Help|Talk to someone]].

== Quick links ==

||<tablestyle="width: 60%;" style="background: #2a2929; font-weight: bold; color: #f6bc05;">All of Launchpad||[[https://bugs.launchpad.net/launchpad-project|All]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?search=Search&field.status=New|New]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?field.status:list=NEW&field.status:list=INCOMPLETE_WITH_RESPONSE&field.status:list=INCOMPLETE_WITHOUT_RESPONSE&field.status:list=CONFIRMED|Untriaged bugs with no importance]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?field.status:list=NEW&field.status:list=INCOMPLETE_WITH_RESPONSE&field.status:list=INCOMPLETE_WITHOUT_RESPONSE&field.status:list=CONFIRMED|Untriaged bugs that have a status]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?field.searchtext=&orderby=-importance&search=Search&field.status%3Alist=TRIAGED&assignee_option=any&field.assignee=&field.bug_reporter=&field.bug_supervisor=&field.bug_commenter=&field.subscriber=&field.tag=&field.tags_combinator=ANY&field.has_cve.used=&field.omit_dupes.used=&field.omit_dupes=on&field.affects_me.used=&field.has_patch.used=&field.has_branches.used=&field.has_branches=on&field.has_no_branches.used=&field.has_no_branches=on&field.has_blueprints.used=&field.has_blueprints=on&field.has_no_blueprints.used=&field.has_no_blueprints=on|Triaged]]||[[https://bugs.launchpad.net/launchpad-project/+bugs?search=Search&field.importance=Critical&field.status=New&field.status=Incomplete&field.status=Confirmed&field.status=Triaged&field.status=In+Progress&field.status=Fix+Committed|Critical]]||

<<Anchor(importance)>>
= Importance =

We use three of Launchpad's bug importances and give each a specific meaning.

||<tablestyle="width: 60%;" rowstyle="background: #2a2929; font-weight: bold; color: #f6bc05;">~+Importance+~||~+Meaning+~||
||<style="font-weight: bold; color: #e01010;"> ~+{{attachment:bug-critical.png}} Critical+~||Bugs that need to jump the queue. When all is well, we should have no Critical bugs.||
||<style="color: #f96413; font-weight: bold;">~+{{attachment:bug-high.png}} High+~||Bugs that are our main priority for attention.||
||<style="color: #d1d03c; font-weight: bold;">~+{{attachment:bug-low.png}} Low+~||All other bugs.||

The importance of a particular bug report reflects the priorities of the Launchpad project. Individuals working on Launchpad may have different priorities. ([[#selecting|Read more about selecting bugs to work on]])

<<Anchor(critical)>>
== Critical ==

Any bug marked ''Critical'' takes priority over all other bugs.

At present, security bugs, regressions (including supported-browser issues) and stakeholder escalations are all marked as ''Critical''. Non-security bugs should also be tagged "regression" etc. so that the reason for their importance is clear. Other types of bug may also be ''Critical''; project leads will expect you to justify marking any other type of bug as ''Critical''.

If all is well with Launchpad, there should be no ''Critical'' bugs.
Line 26: Line 63:
== Only Assign Work that You Are Commiting to do in the Near Future == <<Anchor(high)>>
== High ==
Line 28: Line 66:
When a work is assigned to an engineer, he is commiting to complete the work in the near future. What the "near future" means is different for each project. I suggest 3 releases is the "near future", because when work is planned, the engineer is thinking about now, next, and last. For some projects this period might be 6 weeks, for others, 6 months. These are bugs that will be our main focus in normal operation, timeouts (tagged "timeout"), OOPSes (thanks to our [[https://dev.launchpad.net/PolicyAndProcess/ZeroOOPSPolicy|zero OOPS policy]], and tagged "oops"), and A and AA conformance [[PolicyAndProcess/Accessibility|accessibility bugs]].
Line 30: Line 68:
I prefer to plan for the current release, and the next one. As work is reprioritised, it may be rescheduled to the third release. I do not think it is wise to plan a bug or feature to be completed in the third releases because if it slips to the fourth or fifth released, I doubt the it was correctly prioritized as high. <<Anchor(low)>>
== Low ==
Line 32: Line 71:
Any high work that is assigned to a engineer for more than 3 releases was not high. If it were, the work would have been reassigned to someone who could complete it in the scheduled time. Any other work that is assigned for more than 1 release is also misprioritised. You are lying to yourself, and the the project's users, when you assign work that you are not committing to fixing. We mark as ''Low'' any bug that we recognise as legitimate but that is '''not''' a priority for Canonical staff to fix. This is not the same as planning not to fix the bug; it means that we don't know when we will fix it, if at all. This includes AAA conformance [[PolicyAndProcess/Accessibility|accessibility bugs]].
Line 34: Line 73:
== Practical Classifications of Importance == == Others ==
Line 36: Line 75:
Work is often classified in relative terms. It is better to classify work according to how it are managed to convey when and under what terms the bug will be fixed or a feature will be complete. There are three priorities that work can be classified as: We do not use ''Medium'' or ''Wishlist''. This is primarily to avoid giving false hope to people who are interested in a bug that is neither ''Critical'' nor ''High'': if it does not have one of these statuses, we think it is unlikely we will focus effort on it.
Line 38: Line 77:
Critical:: The bug dramatically impairs users. Users may lose their data. Users cannot complete crucial tasks. The feature is needed to encourage adoption or prevent abandonment of the project. = Tagging bugs =
Line 40: Line 79:
    Synonyms: required, now, must do We tag bugs as part of the triage process. Read the [[https://dev.launchpad.net/LaunchpadBugTags|list of Launchpad tags]] to find out which tags to use.
Line 42: Line 81:
    The work is immediately assigned to a engineer. It is his top priority to fix. Team members help the engineer to plan and do the work. The work is released as soon as it is deployable; in the case of a bug, it is released outside of the release schedule. = Assigning bugs =
Line 44: Line 83:
High:: The bug prevents users from completing their tasks. The feature provides new kinds of tasks or new ways of completing tasks. We do not assign bugs as part of the triage process. Only ''In progress'' bugs should be assigned to someone.
Line 46: Line 85:
    Synonyms: expected, next, can do, should do Even ''Critical'' bugs do not need an assignee, unless they are being worked on. Being at the top of the queue is all we need for ''Critical'' bugs to get the attention they require.
Line 48: Line 87:
    The work is assigned to a engineer to be completed in the next 3 releases. The engineer may choose to do other work if he believes it is within the scope of the high priority work. <<Anchor(selecting)>>
= Selecting bugs to work on =
Line 50: Line 90:
Low:: The bug is an inconvenience to users, but it does not prevent them from completing their tasks. The feature is a convenience to users. If you are working on Launchpad in your own time you'll most likely want to fix those bugs that matter to you, regardless of what importance the Launchpad project gives them. That's great and we welcome all bug fixes; we encourage you to look at [[FixBugs|our page about fixing bugs]] first.
Line 52: Line 92:
    Synonyms: optional, last, may do Members of Canonical's Launchpad team will select bugs as seems appropriate to them.
Line 54: Line 94:
    The engineer may assign the work to himself while working on a high priority work because the high work provides an opportunity to complete the low priority work at less cost. If the low work in any way jeopardises the high priority work, the low work is unassigned. The engineer is thus ''certain'' that the work can be fixed quickly and without difficulty. A corollary to this rule is that low work that is assigned to a engineer must be "in progress" or "fixed" states. <<Anchor(quarterly)>>
= Quarterly review =
Line 56: Line 97:
Four times a year, we put all of the ''High'' bugs back through the triage process. This lets us make sure that all those bugs really should be ''High'' and to take account of anything that has changed since they were last triaged.
Line 57: Line 99:
== The Problem with "Medium" == = Resolving disputes =
Line 59: Line 101:
It might be argued that when the engineer has an opportunity to fix a low or a medium bug, he must choose the medium one. This rules does not define a practical distinction between medium and low. There is no commitment to fix the medium bug; it will not be scheduled for fixing. A engineer chooses to undertake a low bug because he sees an opportunity to fix it while working in the affected code. The engineer is choosing to do unscheduled work because he is ''certain'' it does not jeopardise his scheduled work. The engineer might see an opportunity to fix a medium and a low bug at the same time, but that is unlikely. Beyond these rules a bug is more important than another bug if fixing it will make Launchpad more better than fixing the other bug.
Line 61: Line 103:
It can also be argued that 'critical' is 'high' and that 'high' is 'medium'. True, that is a matter of semantics. The crux of the issue is that there are three practical classifications of work. The words chosen to describe the classifications could use the tofu scale of hard, firm, and soft. People who are unfamiliar with triage will appreciate names that convey the kind of attention the issue will receive. Discretion and a feel for what's in the bug database will help a lot here, as will awareness of our userbase and their needs. One sensible heuristic is to look at five to ten existing ''High'' bugs and, if the new bug is less important than all of them, mark it ''Low'' as it's probably less important than all existing ''High'' bugs.
Line 63: Line 105:
Engineers have discretion to decide any particular bug should be sorted higher (or lower) than it has been; some change requests are very important to many of our users while still not big enough to need a dedicated team working on them.
Line 64: Line 107:
== Consequences of Misprioritised Work ==

Stakeholders often use reports that list the prioritised work for a release and for each engineer. When work is misclassified there are two commonly observed consequences: a decreased in certainty, and a decrease in communication.

In the first consequence, the engineer's effort may be wasted; there are issues that have more ''need and certainty''. Engineers, and other stakeholders, are often tempted to complete the misdirected work after the misclassification is discovered because it is assumed that it is better to always deliver something finished than nothing at all. This is a risky choice, because it jeopardises work in future releases. By working on less important work, the engineer is decreasing the certainty of the more important work.

The second consequence is that the engineer ignores the list and he works on issues according to some other source, such as the opinion of another stakeholder. While the engineer is working on the correct issue, it is unclear to other parties what work is going on and when will it be completed. Users may abandon the project in frustration. Planners cannot coordinate all the stakeholders.

The first consequence is possibly a failure to do re-prioritisation during the triage process, but second consequence is a total failure in the triage process. Why would anyone do triage if the prioritisation will be ignored? How can work be coordinated if the work is unknown to all stakeholders? Why would users trust a project if it does not do what it says it will do?

Work must be reprioritised during the triage process to ensure that engineers are working on the issues with the most need and certainty. Engineers must work from the list or prioritised issues.

== Indicators of Misprioritised Work ==

The rules of practical classification provide tests for misprioritised bugs, features, or tasks.

 * The work is critical, but it is not assigned and targeted for release.
 * The work prioritised as high, but it is not assigned and for a release.
 * The work is high, but have not been worked on in 3 releases.
 * The work is low and unassigned, yet it is targeted for a release.
 * The work is low and assigned, but the engineer is not working on it.
 * The work is considered to be triaged, but it's priority is not critical, high, or low.
 * An engineer is assigned more work than he can accomplish in 3 releases, and it cannot be reassigned.
When two engineers disagree, or if someone in the management chain disagrees, common sense and courtesy should be used in resolving the disagreement.

Triaging Launchpad project bugs

Our triage process is basically this: make sure that Critical and High bugs are correctly marked.

We want:

  • Critical bugs to be those that need attention before all others. Right now: regressions, stakeholder-escalated bugs, operational issues (e.g. build breakage, code issues causing deployment failures, things preventing us detecting other failures such as cronspam, things that should oops but a lack of tooling prevents) and bugs that are dependencies of other critical bugs.

  • The High bugs list to be our main set of top priorities. Some specific sorts of bugs we always treat as high. Right now: OOPSes, timeouts, A and AA treat accessibility bugs.

We would prefer to be able to treat OOPSes and timeouts as critical (as was the case until 2020), but having a practically-usable Critical queue takes priority.

We are currently reviewing previously-triaged bugs. Prior to 2020, the Critical and High queues grew significantly, and many bugs that were marked as such due to their urgency are less urgent when assessed today. By significantly pruning these lists we can ensure that we're focussing our time and energy on the most important priorities.

For a full understanding of why we triage bugs and how we came to develop this process, please read our description of the background to our bug triage process.

How to triage

These are the questions we ask when triaging bug reports about launchpad-project:

  1. Is this a bug in Launchpad-project? If not, move it to the appropriate project or distribution (e.g. Ubuntu) and move to the next bug. Note that bugs in lazr.restful, loggerhead etc are bugs in launchpad-project. To move the bug, click the dropdown button in the left side of the Affects column and then move it to the appropriate project or distribution.

  2. Is this bug on the right subproject? If not, move it to the right sub project.

  3. Is it a duplicate? if there is a duplicate, mark the newer bugs as a duplicate of the older bug.

  4. Is it something we will not do and would not accept a patch to do? If so, mark it as Won't Fix.

  5. Is it an operational request? If yes, convert it to a question.

  6. When are we likely to fix this? Set the importance to show when we'll get to fixing this bug (read more about choosing an importance).

  7. Does the report have enough detail? If we couldn't replicate or otherwise begin work on the bug with the information provided, request further information from the reporter and mark it as Incomplete and move to the next bug. If someone has already asked for more info and the reporter has replied, change the status from Incomplete to Triaged.

  8. Set the status to Triaged.

If you're uncertain what importance to give a bug, chat with another engineer. If there's a disagreement, let common sense and courtesy take priority.

Need help? Talk to someone.

All of Launchpad

All

New

Untriaged bugs with no importance

Untriaged bugs that have a status

Triaged

Critical

Importance

We use three of Launchpad's bug importances and give each a specific meaning.

Importance

Meaning

bug-critical.png Critical

Bugs that need to jump the queue. When all is well, we should have no Critical bugs.

bug-high.png High

Bugs that are our main priority for attention.

bug-low.png Low

All other bugs.

The importance of a particular bug report reflects the priorities of the Launchpad project. Individuals working on Launchpad may have different priorities. (Read more about selecting bugs to work on)

Critical

Any bug marked Critical takes priority over all other bugs.

At present, security bugs, regressions (including supported-browser issues) and stakeholder escalations are all marked as Critical. Non-security bugs should also be tagged "regression" etc. so that the reason for their importance is clear. Other types of bug may also be Critical; project leads will expect you to justify marking any other type of bug as Critical.

If all is well with Launchpad, there should be no Critical bugs.

High

These are bugs that will be our main focus in normal operation, timeouts (tagged "timeout"), OOPSes (thanks to our zero OOPS policy, and tagged "oops"), and A and AA conformance accessibility bugs.

Low

We mark as Low any bug that we recognise as legitimate but that is not a priority for Canonical staff to fix. This is not the same as planning not to fix the bug; it means that we don't know when we will fix it, if at all. This includes AAA conformance accessibility bugs.

Others

We do not use Medium or Wishlist. This is primarily to avoid giving false hope to people who are interested in a bug that is neither Critical nor High: if it does not have one of these statuses, we think it is unlikely we will focus effort on it.

Tagging bugs

We tag bugs as part of the triage process. Read the list of Launchpad tags to find out which tags to use.

Assigning bugs

We do not assign bugs as part of the triage process. Only In progress bugs should be assigned to someone.

Even Critical bugs do not need an assignee, unless they are being worked on. Being at the top of the queue is all we need for Critical bugs to get the attention they require.

Selecting bugs to work on

If you are working on Launchpad in your own time you'll most likely want to fix those bugs that matter to you, regardless of what importance the Launchpad project gives them. That's great and we welcome all bug fixes; we encourage you to look at our page about fixing bugs first.

Members of Canonical's Launchpad team will select bugs as seems appropriate to them.

Quarterly review

Four times a year, we put all of the High bugs back through the triage process. This lets us make sure that all those bugs really should be High and to take account of anything that has changed since they were last triaged.

Resolving disputes

Beyond these rules a bug is more important than another bug if fixing it will make Launchpad more better than fixing the other bug.

Discretion and a feel for what's in the bug database will help a lot here, as will awareness of our userbase and their needs. One sensible heuristic is to look at five to ten existing High bugs and, if the new bug is less important than all of them, mark it Low as it's probably less important than all existing High bugs.

Engineers have discretion to decide any particular bug should be sorted higher (or lower) than it has been; some change requests are very important to many of our users while still not big enough to need a dedicated team working on them.

When two engineers disagree, or if someone in the management chain disagrees, common sense and courtesy should be used in resolving the disagreement.

BugTriage (last edited 2022-04-19 09:36:39 by lgp171188)