Archive index

As an Ubuntu packager
I want correct names, icons, and categories for applications to appear in Ubuntu Software Center automatically
so that I don’t need to remember to do it myself, or run the risk of messing it up.

As an application developer
I want to see the eventual application name, icon, category etc when the application is in my testing PPA
so that I can correct any errors before the application reaches one of the official repositories.

Ubuntu’s app-install-data-ubuntu and app-install-data-commercial packages, and for-purchase application metadata, should be replaced with a single automated system. Soyuz should produce for each archive it controls — including Multiverse, Canonical Partners, and every other PPA — an index of names, icons, summaries, categories, and keywords for all software items in the archive. (A “software item” in this sense mostly corresponds to a binary package, but in some cases one binary package contains multiple applications that should have separate information.) Launchpad should put this index in a standard place in the archive, and rebuild it whenever a package is added or changed.

Rationale

Since the beginning of Ubuntu’s Lucid cycle, we have wanted to get rid of app-install-data-ubuntu and app-install-data-commercial:

Ongoing cost of not doing this

Stakeholders

Matthew Paul Thomas, representing Michael Vogt and Brian Thomason

Implementation

The relevant data should be extracted by inspecting .deb packages either immediately after each package is built (similar to what the .pot file extraction does), or in batches by querying what new .deb packages became available since the last run of the extraction script. In future, additional sources may be used that are not inside the package itself (e.g. the popcon database, or future user-contributed metadata).

The implementation should be flexible, as there is more information that can be extracted. Initially we want the .desktop file and the icon associated with it. But information about the commands that the file puts into /bin and /usr/bin for command-not-found is also interesting.

File format

A file should be generated alongside the Packages.gz file, to store the additional relevant metadata from the desktop files. This file should be called AppInfo for the C locale, and AppInfo-$lang for other locales. (Potentially more files may be generated in the future.)

The file should use RFC-822-style format, for consistency with the Packages file (though the exact format does not matter to apt, since it will just download the file, while other programs like update-apt-xapian-index will parse it). For rsynced archive mirrors, space and amount of files are issues, but the data AppInfo/CommandNotFound data is small and textual, so it is no problem to use RFC-822.

A example file might look like this:

Package: gnome-utils
Version: 2.23.1-0ubuntu1
Popcon: 17939
Section: main
Icon: baobab
Name: Disk Usage Analyzer
Comment: Check folder sizes and available disk space
Exec: baobab
Categories: GTK;GNOME;Utility;

and a localized one:

Package: gnome-utils
Version: 2.23.1-0ubuntu1
Popcon: 17939
Section: main
Icon: baobab
Name-de: Festplatten Überprüfer 
Comment-de: Überprüfen des verfügbaren Platzes
Exec: baobab
Categories: GTK;GNOME;Utility;

Icons are relatively big (~5 KB/app ✕ ~1800 icons = 7 MB), so it is not feasible to stuff them into a single file, especially if we expect a lot of churn (like on extras.ubuntu.com that will also use this system). For this reason, they should be published as individual files, for example http://archive.ubuntu.com/ubuntu/dists/maverick/main/icons or http://archive.ubuntu.com/ubuntu/dists/maverick/icons. Then it will be the job of a client to dynamically fetch needed icons from the local mirror and cache them (as is done on Android, for example).

Overrides

A package may want to override the AppInfo for itself, instead of using the .desktop file. This should be supported on multiple levels.

Extraction

The current data extractor can be found in lp:~mvo/archive-crawler/mvo. A similar (but more clever) approach as this script may be taken to gather the data.

On a Soyuz machine with a full mirror, this script runs every hour (or two hours), and checks with the DB what new deb packages are available since it ran last. Those are fetched and inspected and written to a local sqlite database (or the LP database) and the icons are extracted and stored as well. Because all the data can be rebuild by simply running the extraction again, its probably enough to have a local cache DB. Then the script generates the AppInfo and AppInfo-$lang files and populates the icon directory. In this step it also needs to do orphan cleanup, i.e. removing packages that are no longer in the archive. An rsync cron job is then required to sync the generated file onto archive.ubuntu.com.

Icon extraction can be tricky, as the icon may be stored in a different package than the desktop file (e.g. emacs-common vs emacs23).

Stuff we want to extract:

Roadmap

Issues

Open issues

Success

We will we know we are done when the app-install-data-ubuntu and app-install-data-commercial packages are removed from the Ubuntu archive.

ArchiveIndex (last edited 2010-11-04 10:31:26 by mpt)