Differences between revisions 5 and 6

Ratings and reviews implementation

On 2010-02-03 mvo and barry sketched out a design to support ratings and reviews for Lucid (an LTS). Here are our notes.

Overview

We're going to (ab)use Launchpad answers as the database for ratings and reviews. Every reviewable application will be linked to a question and we'll store individual reviews as comments in the question. User reviews will be inputted on the desktop through Software Center and will be submitted directly to Launchpad via the API. We'll deploy a new service, likely called reviews.ubuntu.com that will provide application reviews via HTTP GET as one big file of XML. reviews.ubuntu.com will talk to LP API to access and collate question comments to prevent hammering the LP database, because it can cache the read-only XML review files.

Mapping applications to questions

Questions have a number and are associated with a language. We need to map reviewable applications on the desktop to a question number. We'll do this with a 4-tuple of:

package_name - the binary package name
application name - the application name. this is necessary because binary packages can contain more than one application (this may be "")
application version - the (major.minor?) version number of the application on the desktop
distro name - e.g. Lucid

This mapping will not be stored explicitly in LP. Instead, reviews.u.c will maintain the mapping and use HTTP and URL trickery to expose this mapping to Software Center.

Curtis:
- We commonly use SourcePackageName which we use to create an instance of a DistributionSourcePackage.
- Answers claims to use SourcePackage, but that is wrong because that limits the answer to a single series. Answers apply to packages in multiple series 99% of the time. The implementation is often uses a DistributionSourcePackage because that is the sane object to return.
- So I ask if a review for a version of "gedit" in Hardy does not apply to a review in Karmic? There was only one user-visible change, and that led to 2 bugs being reported. Few users were affected by the spell checker change, so I am sure a review for Hardy and Karmic are equal.

Downloading reviews for display

When Sheila wants to view the reviews for Emacs, she uses Software Center. S/C generates a URL from the 4 pieces of information above, plus the language she wants to see the reviews in. The URL is something like:

http://reviews.ubuntu.com/lucid/emacs/emacs/23.1/en

If some reviews exist for this application, r.u.c will know what question number this is associated with (because it's already retrieved that mapping). r.u.c will respond with an XML file containing the entire current review stream for the application. The response will include the question number, which is required for submitting reviews. Software Center will parse the returned XML file and present it nicely to Sheila in her S/C interface.

If no review exists yet for the application, r.u.c needs to inform S/C of this, but this introduces a race condition. For example, if Bob wants to review the same version of Emacs as Sheila, who wins? Here are some alternative approaches (comments and other ideas welcome):

Issue a 404

r.u.c could issue a 404 which S/C would take to mean there are no reviews of that application yet. S/C would then allow Sheila to review the app and it would submit a new question to LP with her initial review. How r.u.c discovers this new question and associates it with the application review is discussed below. If Bob also submits a first review before r.u.c discovers Sheila's review, we'll now have two questions in LP which contain the reviews for Emacs.

We would have to expose an API in LP that r.u.c would call to merge the two questions. Probably the question with the lowest number would win. LP would merge the comments from the second question with the first, and then mark the first as invalid. r.u.c would know that the application is mapped to the first question.

The window of opportunity for this race is probably fairly small, since there are 30,000 reviewable applications in Ubuntu, but maybe only a few thousand very common ones. As the review database warms up, there will be fewer popular applications that have not yet been reviewed.

Pre-populate on first request

Another idea is that r.u.c could pre-populate the LP database whenever a review for a non-reviewed application is requested. For example, when Sheila initiates the first review of Emacs, r.u.c would synchronously create a new question for this review. Thus when Bob wants to review Emacs while Sheila is still typing hers, Bob's review will end up on the same question.

The downside of this approach is that we might have lots of questions without review comments. E.g. what if both Sheila and Bob abort their review before submitting it? We've now got an entry in the LP database for Emacs but with no content. We're also concerned that this will hammer the database more as it warms up with new reviews.

Adding a review

Bob wants to add a review for application Gnome-do, for which there is a robust comment history already. Bob's S/C makes a request to:

http://reviews.ubuntu.com/lucid/gnome-do/gnome-do/0.8.3.1/en

and gets a mass of XML in response. This is displayed in the S/C u/i. The question number for this review is given in the response. Bob uses S/C to enter his review of Gnome-do and hits submit. S/C will authenticate Bob to login.ubuntu.com via OpenID and create an OAuth application key for submitting his review. S/C will use launchpadlib to submit Bob's review as a comment on the question. It may provide some local hacks to display Bob's review immediately but other people will not see Bob's review for a little while.

Moderation

We do not yet have moderation for question comments exposed in the LP ui. Our intent is to enable this as the way special people can remove spam comments. The idea is to add a new team, e.g. ~software-center-moderators as a LP celebrity, and to extend permission to edit (or maybe just disable) existing comments to this team. Thus trusted members of the Ubuntu community can be added to the team to moderate reviews.

Currently API exists to edit bug comments, but not yet any ability to edit question comments. This would need to be added as well.

Limiting reviews to one-per-person

The above approach does not yet support limiting reviews to one-per-person. We could potentially build this into the submission API as a validity check for new review comments.

reviews.ubuntu.com

This is a new service we'd have to roll out that would scan LP for new review questions and comments, and build static XML files for vending to the vast Ubuntu usership. The advantage of this is that we can vend these XML files statically, so take advantage of load balancing, caching, etc. This will greatly reduce the read pressure on the LP database for review comments, as only r.u.c will generally query the relevant APIs.

r.u.c will probably run a cron script that will scan LP for new questions above a watermark, looking for questions that are specifically formatted as reviews. It can look for questions assigned to ~software-center-moderators that have a status of review which we will probably want to add.

The review status will be used to hide those questions from the web ui, unless specifically search for of course. This means we won't have to overload the invalid status.

So r.u.c will keep a watermark of the highest question number its seen. It will do two cron tasks:

Scan for updates to existing review questions. r.u.c has a list of questions with review status so it needs to request updated comments for each of those questions. r.u.c can then append the review XML and cache it for any future requests.
Scan for new review questions. r.u.c maintains a watermark of the highest question number its seen to date. It then needs to request a list of new questions, with numbers higher than its watermark and a status of review. These it adds to its database mapping application 4-tuple to question number.

Question format

Questions with status review are specially formatted for use by S/C. Any improperly formatted question will be ignored, as will any improperly formatted comment.

Question summaries will be formatted using RFC 822 style key: value pairs:

Application: distro/pkgname/appname/appversion
Summary: Review of application Foo 5.8.1 in Lucid

Comments will have the following RFC 822 style key: value pairs:

Rating: 4
Summary: Great app, I love it!
Text:
 Gnome-do is the best thing I've ever used.

 My only complaint is that the icon is not purple enough.  Please
 make it more purply.

Normal comment metadata, such as the author and date can be used directly.

XML format

Each reviewed application will be vended by r.u.c as a single XML file. The exact format of that XML file is TBD, but will be generated from a collation of RFC 822 summaries and comments for each question.

API

The following APIs need to be added to Launchpad to support the functionality described above.

Create new question tied to (distro, pkgname, appname, appversion)
Create new comment for (distro, pkgname, appname, appversion)
Get all comments for (distro, pkgname, appname, appversion). Open question is what format this will be returned as. It must be as efficient as possible, but perhaps r.u.c can be the component that formats the response into the expected XML.
Mark question as review (or maybe this happens when new question is added) and invalid (for spam but maybe this happens through the normal LP web ui).
Get all summaries for review status questions with id's > watermark.

Concerns

Lucid is an LTS so once we decide on the external API for ratings and reviews, we've baked that in until the next LTS. Because using Answers is a bit of a hack, it means we'll have to live with this hack for a long time, unless we can abstract away the fact that we're using Answers underneath the hood.

This spec does an effective job of that for retrieving review data, because we're only relying on reviews.ubuntu.com, a specific URL scheme (which of course could be redirected at some point), and an XML structure.

However, for *submitting* reviews, we're exposing the use of Answers to the client. One solution for that is to define a rather generic ISubmitReview interface for the API above. That way we can implement reviews using a totally different mechanism without having to live with a crufty client for years.

-  ⇤ ← Revision 5 as of 2010-02-04 04:47:06 → 
  Size: 10325
  Editor: barry
  Comment:
+   ← Revision 6 as of 2010-02-05 13:47:09 → ⇥
  Size: 11013
  Editor: sinzui
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 34:
+ * Curtis:
    * We commonly use SourcePackageName which we use to create an instance of a DistributionSourcePackage.
    * Answers claims to use SourcePackage, but that is wrong because that  limits the answer to a single series. Answers apply to packages in multiple series 99% of the time. The implementation is often uses a DistributionSourcePackage because that is the sane object to return.
    * So I ask if a review for a version of "gedit" in Hardy does not apply to a review in Karmic? There was only one user-visible change, and that led to 2 bugs being reported. Few users were affected by the spell checker change, so I am sure a review for Hardy and Karmic are equal.

launchpad development

Diff for "RNRDesign"