Diff for "Translations/Specs/MessageSharing/Migration"

Not logged in - Log In / Register

Differences between revisions 26 and 27
Revision 26 as of 2009-03-24 15:39:27
Size: 8469
Editor: jtv
Comment:
Revision 27 as of 2009-03-25 13:03:32
Size: 8609
Editor: jtv
Comment:
Deletions are marked like this. Additions are marked like this.
Line 49: Line 49:
            # XXX: Similar for is_imported now!
Line 51: Line 52:
            # No matching shared message yet, so become one.             clash_with_shared_current = self.is_current and current is not None
            clash_with_shared_imported = self.is_imported and imported is not None
            if not (clash_with_shared_current or clash_with_shared_imported):
                # Make this message shared.
                self.potemplate = None
Line 53: Line 58:
            # If we're about to converge: don't supplant an existing shared imported
            # message. If we're staying diverged: don't permit imported messages to
            # be diverged.
            if imported is not None: self.is_imported = False

            # Converge, unless this is to be a diverged current message.
            if current is None or not self.is_current: self.potemplate = None
            # XXX: If there are two clashes, this message should stay diverged.
            # But if there is exactly one clash, we could in principle clone the
            # message so there's one shared version and one diverged version.

Message Sharing Migration

Part of the message sharing project.

XXX: There should be a unique index on TranslationTemplateItem(potemplate, sequence) for non-null, non-zero sequence numbers.

Assumptions

  • message-sharing-populate has run.
  • Codebase is fully message-sharing-enabled.
  • Schema elements that are to be removed are either gone or no longer used.
  • Discarded: "Imported messages are always shared." Danilo's current branch no longer requires this.

Migration steps

Migration consists of these phases:

  • Code changes
  • Merge POTMsgSets

  • Merge TranslationMessages

  • Restore POFileTranslator

  • Additional database checks and constraints

Code changes

To be landed before migration:

  • (Pseudocode, any resemblance to Python is pure coincidence) This still assumes that imported messages are shared, which is no longer necessary.

    class TranslationMessage:
        # ...
        def converge(self):
            """Make this message shared if possible, or merge it into an existing shared one."""
            if self.potemplate is None: return
    
            shared = TranslationMessage.get(potemplate=None, self.potmgset, self.language, self.variant, self.translations)
            current = TranslationMessage.get(potemplate=None, self.potmsgset, self.language, self.variant, is_current=True)
            imported = TranslationMessage.get(potemplate=None, self.potmsgset, self.language, self.variant, is_imported=True)
    
            if shared:
                # There's a shared message matching this one.  Try to merge.
                if self.is_imported and imported is None:
                    # Bequeath is_imported flag to shared equivalent.
                    shared.is_imported = True
                if self.is_current and current is None:
                    # Bequeath is_current flag to shared equivalent.
                    self.is_current = False
                    shared.is_current = True
                # This message is only worth keeping if it's a diverged current message.
                # XXX: Similar for is_imported now!
                if not (self.is_current and not shared.is_current): self.destroy()
            else:
                clash_with_shared_current = self.is_current and current is not None
                clash_with_shared_imported = self.is_imported and imported is not None
                if not (clash_with_shared_current or clash_with_shared_imported):
                    # Make this message shared.
                    self.potemplate = None
    
                # XXX: If there are two clashes, this message should stay diverged.
                # But if there is exactly one clash, we could in principle clone the
                # message so there's one shared version and one diverged version.

Note: When clearing is_current in validate_is_current, converge the TM that's having the flag cleared.

Merge POTMsgSets

At the end of this phase, TranslationMessages will still be mostly diverged, but they'll be sharing POTMsgSets.

This phase requires at least a freeze on imports.

It's best to go through this separately per Ubuntu release, and maybe once for everything else. We must stop imports for whatever product or distroseries we're merging, which we can do individually for each Ubuntu series.

Commit transactions after every equivalence class of POTemplates. Keep track of which ones are done, so we can restart etc.

  • Define equivalence class of POTemplate as belonging to POTemplates of the same name (not translation domain?) within either the same Product or the same DistroSeries.

  • Define equivalence class of POTMsgSets as being attached (by TranslationTemplateItem) to POTemplates of the same equivalence class, and having the same (msgid_singular, COALESCE(msgid_plural, -1), COALESCE(context, )).

  • Define the representative POTMsgSet in an equivalence class as the one in the series that has translation focus, if any; or failing that, the one with the highest id.

Pseudocode:

def get_key(potmsgset):
    return (potmsgset.msgid_singular, potmsgset.msgid_plural, potmsgset.context)


def merge_potmsgsets(potemplates):
    # Sort potemplates from "most representative" to "least representative."
    potemplates.sort(cmp=template_precedence)

    representatives = {}
    subordinates = {}

    # Figure out representative potmsgsets and their subordinates.
    for template in potemplates:
        for potmsgset in template.potmsgsets:
            key = get_key(potmsgset)
            if key not in representatives: representatives[key] = potmsgset
            representative = representatives[key]
            if representative in subordinates: 
                subordinates[representative].append(potmsgset)
            else:
                subordinates[representative] = []

    for representative, potmsgsets in subordinates.iteritems():
        for subordinate in potmsgsets:
            merge_translationtemplateitems(subordinate, representative)

            for message in subordinate.translation_messages:
                if message.potemplate is None:
                    # Guard against multiple shared imported messages.
                    if message.is_imported:
                        imported = representative.getImportedMessage(message.language, message.variant, potemplate=None)
                        if imported is not None: message.is_imported = False
                    # Guard against multiple shared current messages.
                    if message.is_current:
                        current = representative.getCurrentMessage(message.language, message.variant, potemplate=None)
                        if current is not None: message.is_current = False
                message.potmsgset = representative
            subordinate.destroy()

Merge TranslationMessages

This phase operates only on TranslationMessage, and can be done at leisure.

Use TranslationMessage.converge() as defined above. Run it on all TranslationMessages in distroseries/productseries that have translation focus, then on all other series.

Database constraints

Once the assumptions are fulfilled:

  • If we still have POFile, drop all constraints involving TranslationMessage.pofile.

After rolling out code changes:

After migration:

  • On TranslationMessage:

    • potemplate is null where is_imported
    • potemplate is null where not is_current
    • unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, ), COALESCE(msgstr0, -1), …)

    • unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, )) where is_current is true

    • unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, )) where is_imported is true

Checks

Some things we should check regularly after migration:

  • Due to race conditions it may occasionally be possible for a diverged message to have the same potmsgset and translations as a shared one. We can only guard against that efficiently in python, and run routine checks/cleanups on the database.
    • Any rows we get out of this would be redundant:

          SELECT
              diverged_tm.id,
              converged_tm.id,
              diverged_tm.is_current,
              diverged_tm.is_imported
          FROM TranslationMessage diverged_tm
          JOIN TranslationMessage converged_tm ON
              diverged_tm.potmsgset = converged_tm.potmsgset AND
              diverged_tm.language = converged_tm.language AND
              diverged_tm.potemplate IS NOT NULL AND
              COALESCE(diverged_tm.variant, '') = COALESCE(converged_tm.variant, '') AND
              COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1) AND
              COALESCE(diverged_tm.msgstr1, -1) = COALESCE(converged_tm.msgstr1, -1) AND
              COALESCE(diverged_tm.msgstr2, -1) = COALESCE(converged_tm.msgstr2, -1) AND
              COALESCE(diverged_tm.msgstr3, -1) = COALESCE(converged_tm.msgstr3, -1) AND
              COALESCE(diverged_tm.msgstr4, -1) = COALESCE(converged_tm.msgstr4, -1) AND
              COALESCE(diverged_tm.msgstr5, -1) = COALESCE(converged_tm.msgstr5, -1)
          WHERE
              converged_tm.potemplate IS NULL AND
              converged_tm.is_current;
    • Use converge() to bring these redundant messages back into the fold.

  • Every (potmsgset, potemplate) combination in TranslationMessage must occur in TranslationTemplateItem.

Translations/Specs/MessageSharing/Migration (last edited 2009-04-28 11:36:55 by jtv)