SearchingBugs

Not logged in - Log In / Register

This is a functional specification, not an incremental one. It describes many small features (most of which can be implemented independently), so we can tell whether the features are logically consistent and work together to satisfy likely use cases.

Summary

To cope with vast numbers of bug reports, Launchpad’s bug searching abilities should include functions available in modern search engines and other bug trackers: defaulting to case-insensitive substrings, searching all open bug reports, boolean and other operators for text-only advanced searches, phrase searches and bug number disambiguation, advanced operator suggestions, spelling and punctuation canonicalization, ignoring stop words by default, and spelling suggestions. The text syntax for advanced searches should make results pages simpler, by letting the search be displayed as a line of text. This would subtly train people in the advanced search syntax, but Launchpad should also link to a page of search tips.

Rationale

The most common function in any bug tracker is searching — finding bugs that need work, and checking whether bugs have already been reported. This is even more important with Launchpad, which will eventually contain bug reports about thousands of products, tens of thousands of packages, and dozens of distributions, and which will be used by hundreds of thousands of people of varying abilities. So Launchpad's search must be very powerful and very smart.

Use cases

Implementation progress

{o} unimplemented
/!\ in progress
{*} implemented

Defaulting to case-insensitive substrings

{o} Like Bugzilla has always done, Launchpad should default to searching for case-insensitive substrings -- so that, for example, searching for mail finds bug reports containing e-mail, email, $MAILER, mailing, or mailed. {o} Exact words can be found using the phrase search; for example, searching for "mail" finds only bug reports containing the exact word mail. (This helps satisfy Matt's use case.)

Searchable text

{o} The searchable text for a bug report should be its summary, description, context names, and context display names. It should not include reporters, assignees, statuses, importances, or milestones; these are handled by operator suggestions. Nor should it include comments; searchworthy comment text should be included in updated bug descriptions. (This helps satisfy Ernie's use case.)

Searching all open bugs

{o} https://bugs.launchpad.net/ (and, before that host exists, https://launchpad.net/bugs) should feature a form letting people search for bug reports in any product, package, or distribution across Launchpad. (This helps satisfy Matt's, Ernie's, Marina's, and Tarina's use cases.)

Boolean operators

{o} By default, Launchpad should search for bug reports containing all the terms searched for (except for stop words). {o} Searches for any of a group of terms should be enterable using | or OR. Without brackets, this applies only to the two words and/or phrases on either side of the operator: for example, print dialog OR window should return bug reports containing print and either dialog or window. This should be alterable using brackets: (print dialog) OR window should return bug reports containing print and dialog, and also bug reports containing window.

{o} Unmatched brackets should be ignored for the purpose of the search. They should still be presented in the input field for editing, but should be omitted from the search string displayed next to the number of results.

{o} A - sign in front of any search term should exclude bugs matching that term. For example, -wrap should exclude bug reports containing the word wrap, and dialog -(save OR print) should return bug reports about dialogs other than the Save and Print dialogs. {o} Boolean operators should also be usable with advanced operators and phrase searches, once they are implemented; for example -reporter:mdz should exclude bugs reported by Matt Zimmermann, and -"merge new package" should exclude bug reports asking for new packages to be merged.

Advanced operators

It should be possible to enter advanced operators anywhere in a search string. (This helps satisfy Matt's use case.)

Operator

Examples

{o} assignee:

assignee: assignee:mpt assignee:mpt@myrealbox.com assignee:mpt@canonical.com

{o} attachment:

attachment: attachment:patch attachment:image/*

{o} cc:

cc:mpt cc:mpt@myrealbox.com cc:mpt@canonical.com

{o} cve:

cve: -cve: cve:1999-0067

{o} distro:

distro: distro:impilinux distro:ubuntu/dapper distro:ubuntu/5.10

{o} duplicate:

duplicate: duplicate:1234

{o} elsewhere:

elsewhere: elsewhere:invalid,wontfix elsewhere:open elsewhere:fixed

{o} importance:

importance: importance:high importance:>medium importance:<=low

{o} involving:

involving:mpt involving:mpt@myrealbox.com involving:mpt@canonical.com

{o} keyword:

keyword:design -keyword:privacy

{o} project:

project: project:firefox project:firefox/2.0

{o} projectgroup:

projectgroup:gnome -projectgroup:mozilla

{o} package:

package: package:gnome-panel package:ubuntu/mozilla-firefox

{o} private:

private:, -private:

{o} reporter:

reporter:mpt reporter:mpt@myrealbox.com reporter:mpt@canonical.com

{o} repository:

repository:main -repository:restricted

{o} status:

status:fix-released status:open -status:fixed status:any

{o} tag:

tag: tag:crash,crasher -tag:silly

The operators attachment:, cve: distro:, importance:, package:, project: and tag: should be usable standalone, and the operator private: should work only when used standalone. Where an operator is used standalone, it should filter based on whether the relevant property exists at all. For example, assignee: should return only those bugs that are assigned to someone, cve: should return only those bugs that are associated with a CVE, and product: should return only those bugs that are recorded as occurring upstream. (-project: satisfies Sebastien's first use case, by returning bugs that are not associated with a project.)

{o} For any advanced operator that takes values, either "," or "|" should work to enter multiple values. For example, status:unconfirmed,needs-info should return bug reports with statuses of either Unconfirmed or Needs Info. status:open, a shortcut value for status:unconfirmed,confirmed,needs-info,in-progress, should be an implicit default (except on a person's Bugs pages), so the shortcut value status:any should be usable to include resolved bugs as well. There should be four shortcut values in total:

Shortcut value

{o} open

{o} any

{o} development

{o} fixed

Unconfirmed

*

*

Needs Info

*

*

Confirmed

*

*

*

In Progress

*

*

*

Fix Committed

*

*

*

Fix Released

*

*

(elsewhere:fixed satisfies Sebastien's second use case, by returning bugs that are not fixed in his distro but are fixed somewhere else.)

{o} If a search string contains a substring of the form x:y where x is not recognized, x:y should be passed through as a phrase included in the search string (presented as "x y" next to the results count). {o} If x is a known operator but y is an unknown value, it should similarly be treated as a phrase, but with an added warning y is not a recognized x (or, for the person operators, y is not a registered person). {o} Either way, unknown operators should also be subject to spell-checking suggestions.

{o} Switching from the advanced to the simple search form should convert the graphical operators to the above text syntax. Conversion the other way should not be done, because it would be unexpectedly inconsistent (not happening for operators embedded in boolean expressions, for example, and not maintaining search term order on a round-trip conversion).

Phrase searches

{o} Putting a phrase in "" quotes should cause Launchpad to search for the phrase rather than the individual words. (This satisfies Pieter-Jahn's use case.) {o} As with unmatched brackets, unmatched quotes should be ignored.

Bug number disambiguation

{o} Similarly, putting a number in in "" quotes should cause Launchpad to search for the number rather than going to that bug report. {o} For people who don't realize this, when someone enters a number and Launchpad goes directly to the bug report, the question Were you looking for: bug reports containing "number"? should appear at the top of the bug report. (This satisfies Diogo's use case.)

Advanced operator suggestions

{o} No matter what context the search form is in, if the name of a distribution, distribution release, project, product, package, or bug status, or the ID or e-mail address of a person is present in a search string, Launchpad should present up to two suggestions to refine the search. For example:

{o} These links should be residual: if you follow the bugs in Ubuntu link in that example, links for bugs assigned to Ian Jackson and bugs about Firefox are available on the next results page, so you can progressively narrow down the search.

{o} Launchpad should not make suggestions based on things named after stop words. Launchpad admins should be able to edit the stop words in a single textarea at /malone/+admin.

Spelling and punctuation canonicalization

{o} When indexing and when searching, Launchpad should treat accented characters and other variants as their nearest non-variant equivalent -- treating ö as o, œ as oe, É as e, and so on. This lets people find words even if they do not know how to enter special letters. The mapping table should be tweaked over time to match what people are most likely to type when searching for particular characters. (This satisfies Ernie's use case.)

{o} Hyphens should be treated either as any punctuation, a space in a phrase, or nothing at all. For example, searching for email should return reports containing either email or e-mail, and so should searching for e-mail. And searching for gnome-app-install should return reports containing gnome-app-install, GNOME App Install, or gnomeappinstall.xml, but not a report containing the phrase can't install a GNOME app.

{o} Similarly, Ascii apostrophes (') and graves (` and ´) should be treated either as any punctuation or nothing at all. For example, searching for doesn´t should return reports containing doesn't, doesn`t, doesn´t, or doesnt -- and so should searching for doesn't, doesn`t, or doesnt. (This helps satisfy Marina's and Tarina's use cases.)

{o} All other punctuation (except for the special uses of :, (...), ", -, and + described elsewhere in this specification) should be treated as a space in a phrase, and multiple spaces are collapsed. For example, searching for kp_separator should return results containing KP_Separator, kp-separator, or the phrase kp separator. And searching for the phrase "dialog spacing isnt really correct" should return a bug report containing the phrase "dialog spacing   isn't (really) correct".

In all these cases, the search as entered should still be displayed verbatim in the search field, and in the results count on the results page.

Ignoring stop words by default

{o} Launchpad should ignore common words like a, the, in, and bug when searching, unless they are prefixed with "+" or surrounded by quotes, or unless there are no non-stop words in the search string. For example, searching for bug in the Job pane of the print dialog might return 37 results for Job pane print dialog. As before, the search as entered should still be displayed verbatim in the search field, so it can easily be tweaked by the addition of + or " characters.

Spelling suggestions

{o} If one or more words in a search string are likely misspellings of other words, these should be presented as suggestions at the ends of the search results.

{o} If there are fewer than 20 results for the initial search, and one or more of the words are probably misspelled, and results exist for the most likely suggestion, those results should be displayed automatically underneath the results for the initial search, under the heading <n> results for <adjusted search>:. For example, searching for diaog should automatically return results for dialog.

Presentation of search results

{o} The results for a search should appear like this, using the entire page width to minimize vertical scrolling without compromising the amount of information provided about each report.

{o} If a search has no results, the text "There are no bug reports matching search string." should be printed, along with tips for tweaking the search, and the bottom navigation bar and search form should be omitted.

{o} The first line of a search result should consist of a bug icon representing the bug's importance in the current context (or highest importance value, if the bug is reported on multiple contexts), and a link to the bug report consisting of up to the first 80 characters of the bug's summary, ellipsized at the end if necessary.

{o} The second line should consist of whichever string of up to 80 characters from the bug's description includes the greatest possible number of matching search terms, with ties being broken by beginning at the start of the description if possible, otherwise by balancing context before and after, ideally using whole words. If the description does not include any search terms, the extract should begin at the start of the description. If the chosen extract does not start at the start of the description, it should be preceded by "...", and the same afterward if (as usual) the extract finishes before the end of the description.

{o} Strings in the summary or extract that match the search terms should be <strong>ly emphasized.

{o} The third line should consist of:

Search help

While this specification is being implemented, the operator implemented most recently should be advertised under the relevant search form. For example, when the product: operator is implemented, this should appear under the search form on distribution and package bug search pages:

"More tips" should link to https://launchpad.net/malone/help, which should list all the search features implemented so far that are directly useful (this excludes bug number disambiguation, spelling suggestions etc), in chronological order.

{o} Once all of this specification (apart from this paragraph) has been fully implemented for two weeks, the hint should be removed, the "More tips" link should become "Search tips" alongside the search forms, and /malone/help should be rearranged into logical order.

Bugs that are not addressed by this specification

Unresolved issues

SearchingBugs (last edited 2010-10-14 10:31:28 by mpt)