= Archive index = '''As an''' Ubuntu packager<
> '''I want''' correct names, icons, and categories for applications to appear in Ubuntu Software Center automatically<
> '''so that''' I don’t need to remember to do it myself, or run the risk of messing it up. '''As an''' application developer<
> '''I want''' to see the eventual application name, icon, category etc when the application is in my testing PPA<
> '''so that''' I can correct any errors before the application reaches one of the official repositories. Ubuntu’s `app-install-data-ubuntu` and `app-install-data-commercial` packages, and for-purchase application metadata, should be replaced with a single automated system. Soyuz should produce for each archive it controls — including Multiverse, Canonical Partners, and every other PPA — an index of names, icons, summaries, categories, and keywords for all software items in the archive. (A “software item” in this sense mostly corresponds to a binary package, but in some cases one binary package contains multiple applications that should have separate information.) Launchpad should put this index in a standard place in the archive, and rebuild it whenever a package is added or changed. == Rationale == Since the beginning of Ubuntu’s Lucid cycle, we have wanted to get rid of `app-install-data-ubuntu` and `app-install-data-commercial`: * they are slow and difficult to update * the time we want to update `app-install-data-ubuntu` is when the archive is frozen * there are exceptions and bugs, so software shows up in Ubuntu Software Center that isn’t installable (or conversely, is hidden as a “technical item” when it isn’t) * they work only for Main, Universe, and Partner, not for PPAs, Multiverse, or third-party archives. === Ongoing cost of not doing this === * Whenever a graphical application is added to Main or Universe, or its icon changes, Michael Vogt needs to rebuild the `app-install-data-ubuntu` package. Almost [[https://bugs.launchpad.net/ubuntu/+source/app-install-data-ubuntu/+bugs?field.searchtext=&orderby=-importance&search=Search&field.status:list=NEW&field.status:list=OPINION&field.status:list=EXPIRED&field.status:list=CONFIRMED&field.status:list=TRIAGED&field.status:list=INPROGRESS&field.status:list=FIXCOMMITTED&field.status:list=FIXRELEASED|every bug in this package]] represents a cost of not generating this index automatically. * Whenever a for-purchase application is added to the Ubuntu Software Center store, Brian Thomason or Michael Vogt needs to manually add metadata for the package to the Software Center Agent. This can’t scale beyond a few dozen applications per week. (They also need to register the price of the application; that needs to be automated separately.) * Whenever a graphical application is added to (or updated in) Canonical Partners, Brian Thomason has needed to remember to rebuild the `app-install-data-commercial` package. Almost [[https://bugs.launchpad.net/ubuntu/+source/app-install-data-commercial/+bugs?field.searchtext=&orderby=-importance&search=Search&field.status:list=NEW&field.status:list=OPINION&field.status:list=EXPIRED&field.status:list=CONFIRMED&field.status:list=TRIAGED&field.status:list=INPROGRESS&field.status:list=FIXCOMMITTED&field.status:list=FIXRELEASED|every bug in that package]] represents a cost of not generating that index automatically. (In future, Canonical Partners may be merged with the for-purchase archive, but that would just change one kind of manual work to another.) * Whenever an open-source application goes through the post-release process, the packager needs to [[https://wiki.ubuntu.com/PostReleaseApps/Metadata|add custom metadata fields]] to `debian/control`, metadata that duplicates existing fields in the application’s .desktop file. They shouldn’t have to do this. == Stakeholders == Matthew Paul Thomas, representing Michael Vogt and Brian Thomason == Implementation == The relevant data should be extracted by inspecting `.deb` packages either immediately after each package is built (similar to what the `.pot` file extraction does), or in batches by querying what new `.deb` packages became available since the last run of the extraction script. In future, additional sources may be used that are not inside the package itself (e.g. the `popcon` database, or future [[UserContributedMetadata|user-contributed metadata]]). The implementation should be flexible, as there is more information that can be extracted. Initially we want the `.desktop` file and the icon associated with it. But information about the commands that the file puts into `/bin` and `/usr/bin` for `command-not-found` is also interesting. === File format === A file should be generated alongside the `Packages.gz` file, to store the additional relevant metadata from the desktop files. This file should be called `AppInfo` for the `C` locale, and `AppInfo-`''$lang'' for other locales. (Potentially more files may be generated in the future.) The file should use RFC-822-style format, for consistency with the `Packages` file (though the exact format does not matter to `apt`, since it will just download the file, while other programs like `update-apt-xapian-index` will parse it). For `rsync`ed archive mirrors, space and amount of files are issues, but the data AppInfo/CommandNotFound data is small and textual, so it is no problem to use RFC-822. A example file might look like this: {{{ Package: gnome-utils Version: 2.23.1-0ubuntu1 Popcon: 17939 Section: main Icon: baobab Name: Disk Usage Analyzer Comment: Check folder sizes and available disk space Exec: baobab Categories: GTK;GNOME;Utility; }}} and a localized one: {{{ Package: gnome-utils Version: 2.23.1-0ubuntu1 Popcon: 17939 Section: main Icon: baobab Name-de: Festplatten Überprüfer Comment-de: Überprüfen des verfügbaren Platzes Exec: baobab Categories: GTK;GNOME;Utility; }}} Icons are relatively big (~5 KB/app ✕ ~1800 icons = 7 MB), so it is not feasible to stuff them into a single file, especially if we expect a lot of churn (like on `extras.ubuntu.com` that will also use this system). For this reason, they should be published as individual files, for example `http://archive.ubuntu.com/ubuntu/dists/maverick/main/icons` or `http://archive.ubuntu.com/ubuntu/dists/maverick/icons`. Then it will be the job of a client to dynamically fetch needed icons from the local mirror and cache them (as is done on Android, for example). === Overrides === A package may want to override the `AppInfo` for itself, instead of using the `.desktop` file. This should be supported on multiple levels. * It should be possible to blacklist a package via `XB-NoAppInfo: 1` in its control file. This means that the package will not be scanned for `.desktop` files at all. * It should also be possible to override all of the appinfo by having a `debian/appinfo` file in the source package that overrides the desktop extraction entirely and forces the system to simply use this file. This requires the extraction to look into the source package for this file first (if that is tricky to implement we could move it into the binary to a known path). * Finally, if the .desktop file contains `X-AppInfo-`''*'' fields already, the extractor should honor those and keep them. This is useful if (for example) `X-AppInfo-Package` is wrong on extraction. For example, if the `.desktop` file for the `wesnoth` game lives in `wesnoth-common`, this is a good way to override it. === Extraction === The current data extractor can be found in [[https://code.launchpad.net/~mvo/archive-crawler/mvo|lp:~mvo/archive-crawler/mvo]]. A similar (but more clever) approach as this script may be taken to gather the data. On a Soyuz machine with a full mirror, this script runs every hour (or two hours), and checks with the DB what new deb packages are available since it ran last. Those are fetched and inspected and written to a local `sqlite` database (or the LP database) and the icons are extracted and stored as well. Because all the data can be rebuild by simply running the extraction again, its probably enough to have a local cache DB. Then the script generates the `AppInfo` and `AppInfo-`''$lang'' files and populates the icon directory. In this step it also needs to do orphan cleanup, ''i.e.'' removing packages that are no longer in the archive. An `rsync` `cron` job is then required to sync the generated file onto `archive.ubuntu.com`. Icon extraction can be tricky, as the icon may be stored in a different package than the desktop file (e.g. emacs-common vs emacs23). Stuff we want to extract: * desktop file: * `appname` (potentially multi language) * `packagename` * `pkgversion` (to ensure we can validate our data is not stale) * `Comment` (friendly summary) - multi language * `popcon` - probably not needed anymore once we have ratings * `keywords` (X-AppInstall-Keywords) * `iconname` * `Categories` * `mime-type` * `codec-info` * Icons * as individual files that get dynamically fetched * command not found data * packagename -> binaries * PROBLEM diverts etc, real world problem for e.g. vim === Roadmap === * Start with non-localized data and only for PPAs * Expand to the full archive * iteratively survey the differences between app-install-data and the metadata Soyuz is producing * fix bugs in the packages and/or in Soyuz === Issues === * A .desktop file is in a separate package from the package you're actually interested in * e.g. wesnoth-data vs. wesnoth * e.g. emacs-common contains the icon for emacs22 * this should be fixed in the packages themselves with the override mechanism * A icon file is in a seperate package from the desktop file * check how common that actually is * Keywords/comments should be available in rosetta for translation, ideally as a additional template in the ubuntu package translation page * one pkg may contain multiple apps * app names are not unique (e.g. Terminal is used multiple times). === Open issues === * need to disuss and create a standard for what kind of metadata can be supplied within the package and finalize the override mechanism * `debian/control` modifications * `debian/something.desktop` with `X-App-Install` tags * `debian/something.something` for `command-not-found` hints * provide a way (LP/external site) to allow easy modifications/cleanup of metadata (improve descriptions, improve categories etc) * Things this metadata might contain in the future: * hardware/software requirements (opengl etc) * whether it’s available in my language (may change after package upload since translation packages are different) == Success == We will we know we are done when the `app-install-data-ubuntu` and `app-install-data-commercial` packages are removed from the Ubuntu archive.