Diff for "Translations/Specs/MessageSharing/Migration"

Not logged in - Log In / Register

Differences between revisions 1 and 2
Revision 1 as of 2009-02-12 16:51:09
Size: 6303
Editor: jtv
Comment:
Revision 2 as of 2009-02-12 16:53:15
Size: 6230
Editor: jtv
Comment:
Deletions are marked like this. Additions are marked like this.
Line 25: Line 25:
 * Cleanup  * Initial Cleanup
Line 30: Line 30:
=== Cleanup === === Initial Cleanup ===
Line 61: Line 61:
 1. For each {{{TranslationMessage}}}, replace potmsgset with its representative {{{POTMsgSet}}.
 1. From each {{{POTMsgSet}}} equivalence class, delete all members but the representative one. Be careful: new {{{TranslationMessage}}}s for these may have appeared in the meantime.  Perhaps the best thing to
 1. For each {{{TranslationMessage}}}, replace potmsgset with its representative {{{POTMsgSet}}}.
 1. From each {{{POTMsgSet}}} equivalence class, delete all members but the representative one. Be careful: new {{{TranslationMessage}}}s for these may have appeared in the meantime.
Line 82: Line 82:
 * TranslationMessage: potemplate is null where not (is_current or is_imported).
 * TranslationMessage: unique (potmsgset, potemplate, language, COALESCE(variant, -1)) where potemplate is not null.
 * TranslationMessage: unique (potmsgset, language, COALESCE(variant, -1), msgstr0, ...) where potemplate is null.
 * TranslationMessage: unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, -1)) where is_current is true.
 * TranslationMessage: unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, -1)) where is_imported is true.
 * On {{{TranslationMessage}}}:
  *
potemplate is null where not (is_current or is_imported).
  * unique (potmsgset, potemplate, language, COALESCE(variant, -1)) where potemplate is not null.
  * unique (potmsgset, language, COALESCE(variant, -1), msgstr0, ...) where potemplate is null.
  * unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, -1)) where is_current is true.
  * unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, -1)) where is_imported is true.

Message Sharing Migration

Part of the message sharing project.

Assumptions

  • message-sharing-populate has run.
  • Codebase is fully message-sharing-enabled.
  • Deleted schema elements are no longer used.

Code changes

To be landed before migration:

  • Define TranslationMessage.converge():

    1. Look for TM "shared" with same (potmsgset, language, COALESCE(variant, -1), msgstr0, ...) as self, where shared.potemplate is null.
    2. If found: delete self.

    3. If not found: self.potemplate = NULL.

  • When clearing is_imported in validate_is_imported or is_current in validate_is_current, converge the TM that's having the flag cleared if it ends up being neither imported nor current.

Migration steps

Migration consists of these phases:

  • Initial Cleanup
  • Merge POTMsgSets

  • Converge TranslationMessages

Initial Cleanup

Before migrating, we should delete all non-current POTemplates. This eliminates some complications in migration.

Before we can delete a POTemplate, we must delete any rows that refer to it:

  1. Clear TranslationMessage.pofile if the column still exists. We won't be using it anymore.

  2. Delete all TranslationMessages that refer to the template in their potemplate fields.

  3. Find all TranslationTemplateItems that refer to the template, and delete them.

  4. Find any POTMsgSets that the deleted TranslationTemplateItems referred to, and that have no other TranslationTemplateItems referring to them.

  5. Delete any TranslationMessages attached to those POTMsgSets.

  6. Delete those POTMsgSets.

  7. Delete all POFiles that refer to the template. Any references in POFileTranslator should have gone away when we deleted the TranslationMessages.

  8. Delete the POTemplate.

Merge POTMsgSets

TODO: Properly merge TranslationMessages that already have their potemplate set to null (e.g. XPI English "translations") but whose POTMsgSets aren't merged yet.

At the end of this phase, TranslationMessages will still be mostly diverged, but they'll be sharing POTMsgSets.

This phase requires at least a freeze on imports.

It's best to go through this separately per Ubuntu release, and maybe once for everything else. We must stop imports for whatever product or distroseries we're merging, which we can do individually for each Ubuntu series.

  • Define equivalence class of POTemplate as belonging to POTemplates of the same name (not translation domain?) within either the same Product or the same DistroSeries.

  • Define equivalence class of POTMsgSets as being attached (by TranslationTemplateItem) to POTemplates of the same equivalence class, and having the same (msgid_singular, COALESCE(msgid_plural, -1), COALESCE(context, )).

  • Define the representative POTMsgSet in an equivalence class as the one in the series that has translation focus, if any; or failing that, the one with the highest id.

Steps:

  1. For each equivalence class of POTMsgSets, merge all TranslationTemplateItems into those of the representative POTMsgSet.

  2. For each TranslationMessage, replace potmsgset with its representative POTMsgSet.

  3. From each POTMsgSet equivalence class, delete all members but the representative one. Be careful: new TranslationMessages for these may have appeared in the meantime.

Converge TranslationMessages

This phase operates only on TranslationMessage, and can be done at leisure.

  • Converge all TMs where is_current is false and is_imported is false.
  • Converge TMs where is_current or is_imported, and where TM.potmsgset.potemplate.distroseries has translation focus.
  • Converge TMs where is_current or is_imported, and where TM.potmsgset.potemplate.productseries has translation focus.
  • Clean up POTMsgSets without TranslationTemplateItems attached.

Database constraints

Once the assumptions are fulfilled:

After rolling out code changes:

After migration:

  • On TranslationMessage:

    • potemplate is null where not (is_current or is_imported).
    • unique (potmsgset, potemplate, language, COALESCE(variant, -1)) where potemplate is not null.
    • unique (potmsgset, language, COALESCE(variant, -1), msgstr0, ...) where potemplate is null.
    • unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, -1)) where is_current is true.
    • unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, -1)) where is_imported is true.

Checks

Some things we should check regularly after migration:

  • Due to race conditions it may occasionally be possible for a diverged message to have the same potmsgset and translations as a shared one. We can only guard against that efficiently in python, and run routine checks/cleanups on the database.
    •     SELECT
              diverged_tm.id,
              converged_tm.id,
              diverged_tm.is_current,
              diverged_tm.is_imported
          FROM TranslationMessage diverged_tm
          JOIN TranslationMessage converged_tm ON
              diverged_tm.potmsgset = converged_tm.potmsgset AND
              COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1) AND
              COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1) AND
              COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1) AND
              COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1) AND
              COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1) AND
              COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1);
    • Delete diverged_tm first to avoid running into database constraints.
    • "OR" diverged_tm's is_current/is_imported into those of converged_tm.
  • Every (potmsgset, potemplate) combination in TranslationMessage must occur in TranslationTemplateItem.

Translations/Specs/MessageSharing/Migration (last edited 2009-04-28 11:36:55 by jtv)