Translations/Specs/MessageSharing

Not logged in - Log In / Register

Sharing Messages Between Translations

POTMsgSets are currently contained in POTemplates. If we change that to an m:n relationship, a single POTMsgSet can occur in multiple templates—and its translations can be shared between all translations of those templates to the same language.

In this document we only consider sharing between "generations" of the same template, across series of either a product or a distribution. Sharing between distribution source packages and products is left for later.

When a new product or distro series is created, its translations can be populated easily and, if desired, automatically. The current schema requires well over 20 million TranslationMessages to be copied for every new Ubuntu release, which in theory can be done in a day but due to locking conflicts can take several in practice. Sharing messages reduces the work load by orders of magnitude.

XXX: Copy design description here

Prior work

Expectations

Based on our existing data, we expect to reduce the number of rows in the TranslationMessage table by up to 3/4ths and POTMsgSet by 2/3rds. The linking table will have about as many rows as the existing POTMsgSet table, but will be much narrower and contain variable-sized columns. This reduction is not crucial with the current database size and server memory, but the creation of future product or distro series will result in much smaller database increases.

We expect to be able to perform most of the data migration without taking the system offline.

Plan

(Blueprint: message-sharing).

Implementation steps:

  1. Schema additions. Create linking table between POTMsgSet and POTemplate, add several columns. (message-sharing-schema-additions, 2 story points)

  2. Python code update to initialize the new columns and the linking table on newly-created messages. (message-sharing-initialize-new)

  3. Script to populate linking table and new columns in the background (for existing messages). (message-sharing-populate, 2 story points)

  4. Python code update to start using the linking table & new columns. Add "NOT NULL" constraints to TranslationMessage.potemplate and TranslationMessage.language. (message-sharing- switchover, 10 story points)

    • UI change: show whether translations/suggestions are shared or diverged. (No change in suggestions sorting).
    • Change updateTranslations():

      • converging: make old message non-current, clear potemplate field

      • staying shared: "move" is_current bit
      • staying diverged: duplicate TranslationMessage if needed, set potemplate field

    • Update export code.
    • Update super-fast-imports cache.
    • Change statistics computation.
    • When creating a POFile, create it in all series of a distro/product.

  5. Schema deletions. A few columns will be obsolete at this point. (message-sharing-schema-deletions, 4 story points)

  6. Script to merge identical messages into shared ones, without taking the system offline. We may want to take Ubuntu translations offline for the duration. (message-sharing-migration, 6 story points)

    • Replace POTMsgSets with linking-table entries.

    • Mark translations in the focus series as the shared ones.
    • Find and remove redundant POTMsgSets and TranslationMessages.

    • Periodically check for redundant divergence.
    • Batch by same criteria as sharing itself: template name, and product/distro name.
    • Clear potemplate field on suggestions.

    • Constraint: TranslationMessages that are neither imported nor current can't be diverged.

  7. Re-do cross-series translations copying (4 story points).
    • Copy templates, linking-table entries, POFiles (but not TranslationMessages).

    • Run live for product series.
    • Copy from current translation focus while opening a new series; if it's not the best choice, people can always upload a template.
  8. Automate the merging and splitting of shared POTMsgSets that may come with certain template renaming scenarios. (message-sharing-template-renaming)

Schema changes

Schema additions

On TranslationMessage:

New table TranslationTemplateItem links POTMsgSet to POTemplate.

Schema removals

On TranslationMessage:

On POTMsgSet:

Misc. notes

Messages for the same distribution can be shared if they have the same translation domain. In other words, for distribution translations, the sharing context is defined by (distribution, translation_domain). The source package name is not part of it, so sharing can happen across source packages.

Implement sharing in the UI in the same way as "Packaged" translations.

Translations/Specs/MessageSharing (last edited 2009-02-12 16:53:36 by jtv)