ArchiveIndex

Not logged in - Log In / Register

Revision 8 as of 2010-11-04 09:38:45

Clear message

Archive index

As an Ubuntu packager
I want correct names, icons, and categories for applications to appear in Ubuntu Software Center automatically
so that I don’t need to remember to do it myself, or run the risk of messing it up.

As an application developer
I want to see the eventual application name, icon, category etc when the application is in my testing PPA
so that I can correct any errors before the application reaches one of the official repositories.

Ubuntu’s app-install-data-ubuntu and app-install-data-commercial packages, and for-purchase application metadata, should be replaced with a single automated system. Soyuz should produce for each archive it controls — including Multiverse, Canonical Partners, and every other PPA — an index of names, icons, summaries, categories, and keywords for all software items in the archive. (A “software item” in this sense mostly corresponds to a binary package, but in some cases one binary package contains multiple applications that should have separate information.) Launchpad should put this index in a standard place in the archive, and rebuild it whenever a package is added or changed.

Rationale

Since the beginning of Ubuntu’s Lucid cycle, we have wanted to get rid of app-install-data-ubuntu and app-install-data-commercial:

Ongoing cost of not doing this

Stakeholders

Matthew Paul Thomas, representing Michael Vogt and Brian Thomason

Implementation

The relevant data can be extracted by inspecting the deb package after the build (similar to what the pot file extraction is doing). Alternatively it can do the processing in batches by querrying what new deb packages became available since the last run of the extraction script. Please note that additional source may be used that are not inside the package itself (e.g. the popcon database or in the future user-generated meta-data).

The implementation should be flexible as there is more information that can be extracted. Initially we want the desktop file and the icon associated with it. But information about the commands that the file puts into /bin/ and /usr/bin for command-not-found is also interessting.

File format

One (but potentially more in the future) file should be geneated alongside the Packages.gz file to store the additional relevant meta-data from the desktop files. This file should be called "AppInfo" for the C locale and "AppInfo-$lang" for the other locales.

Space and amount of files (because of rsync) are a issues for mirrors. The AppInfo/CommandNotFound data is small and textual, so its no problem to publish it in e.g. rfc822 format alongside Packages file (the extact format does not matter to apt, it will just download it, other apps like update-apt-xapian-index will parse it). But to make it consistent with the rest of data we should simply use RFC-822.

A example file might look like this:

Package: gnome-utils
Version: 2.23.1-0ubuntu1
Popcon: 17939
Section: main
Icon: baobab
Name: Disk Usage Analyzer
Comment: Check folder sizes and available disk space
Exec: baobab
Categories: GTK;GNOME;Utility;

and a localized one:

Package: gnome-utils
Version: 2.23.1-0ubuntu1
Popcon: 17939
Section: main
Icon: baobab
Name-de: Festplatten Überprüfer 
Comment-de: Überprüfen des verfügbaren Platzes
Exec: baobab
Categories: GTK;GNOME;Utility;

Icons are relatively big (~5k/app; currently we have ~1800 icons=7Mb) so its not feasible to stuff them into a single file, especially if we expect a lot of churn (like on extra.ubuntu.com that will also use this system). For this reason, they should be published as individual files on e.g. http://archive.ubuntu.com/ubuntu/dists/maverick/main/icons (or) http://archive.ubuntu.com/ubuntu/dists/maverick/icons

This means that its the job of the client to dynamically fetch the icons from the local mirror and cache them. This is what is done on e.g. android as well.

Overrides

A package may want to override the AppInfo for itself instead of using the desktop file. This should be supported on multiple levels.

It should be possible to blacklist a package via "XB-NoAppInfo: 1" in its control file. This means that the package will not be scanned for desktop files at all.

It should also be possible to override all of the appinfo by having a debian/appinfo file in the source package that overrides the desktop extraction entirely and forces the system to simply use this file. This requires the extraction to look into the source package for this file first (if that is tricky to implement we could move it into the binary to a known path).

And finally if the .desktop file contains "X-AppInfo" fields already, the extractor should honor those and keep them. This is useful if e.g. X-AppInfo-Package is wrong on extraction. If a desktop file lifes in "wesnoth-common" but the package we want is "wesnoth" this is a good way to override it.

Extraction

The current data extractor can be found in lp:~mvo/archive-crawler/mvo A similar (but more clever) approach as this script may be taken to gather the data.

On a soyuz machine with a full mirror the script runs every hour (or two hours) and checks with the DB what new deb packages are available since it ran last. Those are fetched and inspected and written to a local sqlite database (or the LP database) and the icons are extracted and stored as well. Because all the data can be rebuild by simply running the extraction again its probably enough to have a local cache db. Then the script generates the AppInfo and AppInfo-$lang files and populates the icon directory. In this step it also needs to do orphan clenaup, i.e. removing packages that are no longer in the archive. A rsync cron job is then required to rsync the generated file onto archive.ubuntu.com.

Icon extraction can be tricky as the icon may be stored in a different package than the desktop file (e.g. emacs-common vs emacs23).

Stuff we want to extract:

Roadmap

Issues

Open issues

Success

We will we know we are done when the app-install-data-ubuntu and app-install-data-commercial packages are removed from the Ubuntu archive.