= Message Sharing Migration = Part of the [[Translations/Specs/MessageSharing|message sharing]] project. == Assumptions == * message-sharing-populate has run. * Codebase is fully message-sharing-enabled. * Schema elements that are to be removed are either gone or no longer used. * Templates within a Product are partitioned by template name. * Templates within a Distribution are partitioned by template name and source package name. * Either {{{TranslationMessage.potemplate}}} is never null, or suggestions are still allowed to be diverged. * '''Discarded:''' "Imported messages are always shared." Danilo's current branch no longer requires this. == Migration steps == Migration consists of these phases: * Code changes * Merge {{{POTMsgSet}}}s * Merge {{{TranslationMessage}}}s * Restore {{{POFileTranslator}}} * Additional database checks and constraints == Code changes == To be landed before migration: * (Pseudocode, any resemblance to Python is pure coincidence) {{{ class TranslationMessage: # ... def converge(self): """Make this message shared if possible, or merge it into an existing shared one.""" if self.potemplate is None: return shared = TranslationMessage.get(potemplate=None, self.potmgset, self.language, self.variant, self.translations) current = TranslationMessage.get(potemplate=None, self.potmsgset, self.language, self.variant, is_current=True) imported = TranslationMessage.get(potemplate=None, self.potmsgset, self.language, self.variant, is_imported=True) if shared is None: clash_with_shared_current = self.is_current and current is not None clash_with_shared_imported = self.is_imported and imported is not None if not (clash_with_shared_current or clash_with_shared_imported): # Make this message shared. self.potemplate = None # If there are two clashes, this message should stay diverged. # XXX: If there is exactly one clash, we could in principle clone the # message so that there's one shared version and one diverged version. elif not (self.is_current or self.is_imported): # This is a suggestion duplicating an existing shared message. self.destroySelf() else: # Try to transfer current/imported flags to shared equivalent. if self.is_current and current is None: shared.is_current = True if self.is_imported and imported is None: shared.is_imported = True same_current = (self.is_current == shared.is_current) same_imported = (self.is_imported == shared.is_imported) if (same_current and same_imported): # This message is now totally redundant. self.destroySelf() }}} Note: When clearing is_current in {{{validate_is_current}}}, converge the TM that's having the flag cleared. === Merge POTMsgSets === At the end of this phase, {{{TranslationMessage}}}s will still be mostly diverged, but they'll be sharing {{{POTMsgSets}}}. This phase requires at least a freeze on imports. It's best to go through this separately per Ubuntu release, and maybe once for everything else. We must stop imports for whatever product or distroseries we're merging, which we can do individually for each Ubuntu series. Commit transactions after every equivalence class of {{{POTemplate}}}s. Keep track of which ones are done, so we can restart etc. * Define equivalence class of {{{POTemplate}}} as belonging to {{{POTemplate}}}s of the same name within the same {{{Product}}}, or ones with the same name and source package name in the same {{{Distribution}}}. * Define equivalence class of {{{POTMsgSets}}} as being attached (by {{{TranslationTemplateItem}}}) to {{{POTemplates}}} of the same equivalence class, and having the same (msgid_singular, COALESCE(msgid_plural, -1), COALESCE(context, '')). * Define the representative {{{POTMsgSet}}} in an equivalence class as the one in the series that has translation focus, if any; or failing that, the one with the highest id. Pseudocode: {{{ def get_potmsgset_key(potmsgset): return (potmsgset.msgid_singular, potmsgset.msgid_plural, potmsgset.context) def merge_potmsgsets(potemplates): # Sort potemplates from "most representative" to "least representative." potemplates.sort(cmp=template_precedence) representatives = {} subordinates = {} # Figure out representative potmsgsets and their subordinates. for template in potemplates: for potmsgset in template.potmsgsets: key = get_potmsgset_key(potmsgset) if key not in representatives: representatives[key] = potmsgset representative = representatives[key] if representative in subordinates: subordinates[representative].append(potmsgset) else: subordinates[representative] = [] for representative, potmsgsets in subordinates.iteritems(): for subordinate in potmsgsets: merge_translationtemplateitems(subordinate, representative) for message in subordinate.translation_messages: if message.potemplate is None: # Guard against multiple shared imported messages. if message.is_imported: imported = representative.getImportedMessage(message.language, message.variant, potemplate=None) if imported is not None: message.is_imported = False # Guard against multiple shared current messages. if message.is_current: current = representative.getCurrentMessage(message.language, message.variant, potemplate=None) if current is not None: message.is_current = False message.potmsgset = representative subordinate.destroy() }}} === Merge TranslationMessages === This phase operates only on {{{TranslationMessage}}}, and can be done at leisure. Use {{{TranslationMessage.converge()}}} as defined above. Run it on all {{{TranslationMessage}}}s in distroseries/productseries that have translation focus, then on all other series from most representative to least representative. Pseudocode: {{{ def merge_translationmessages(potemplates): # Sort potemplates from "most representative" to "least representative." potemplates.sort(cmp=template_precedence) for template in potemplates: for potmsgset in template: for message in potmsget.getAllTranslationMessages(): message.converge() }}} == Database constraints == Once the assumptions are fulfilled: * If we still have it, drop all constraints involving {{{TranslationMessage.pofile}}}. After rolling out code changes: After migration: * On {{{TranslationMessage}}}: * potemplate is null where is_imported * potemplate is null where not is_current * unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, ''), COALESCE(msgstr0, -1), …) * unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, '')) where is_current is true * unique (potmsgset, COALESCE(potemplate, -1), language, COALESCE(variant, '')) where is_imported is true == Checks == Some things we should check regularly after migration: * Due to race conditions it may occasionally be possible for a diverged message to have the same potmsgset and translations as a shared one. We can only guard against that efficiently in python, and run routine checks/cleanups on the database. * Any rows we get out of this would be redundant: {{{ SELECT diverged_tm.id, converged_tm.id, diverged_tm.is_current, diverged_tm.is_imported FROM TranslationMessage diverged_tm JOIN TranslationMessage converged_tm ON diverged_tm.potmsgset = converged_tm.potmsgset AND diverged_tm.language = converged_tm.language AND diverged_tm.potemplate IS NOT NULL AND COALESCE(diverged_tm.variant, '') = COALESCE(converged_tm.variant, '') AND COALESCE(diverged_tm.msgstr0, -1) = COALESCE(converged_tm.msgstr0, -1) AND COALESCE(diverged_tm.msgstr1, -1) = COALESCE(converged_tm.msgstr1, -1) AND COALESCE(diverged_tm.msgstr2, -1) = COALESCE(converged_tm.msgstr2, -1) AND COALESCE(diverged_tm.msgstr3, -1) = COALESCE(converged_tm.msgstr3, -1) AND COALESCE(diverged_tm.msgstr4, -1) = COALESCE(converged_tm.msgstr4, -1) AND COALESCE(diverged_tm.msgstr5, -1) = COALESCE(converged_tm.msgstr5, -1) WHERE converged_tm.potemplate IS NULL AND converged_tm.is_current; }}} * Use {{{converge()}}} to bring these redundant messages back into the fold. * Every (potmsgset, potemplate) combination in {{{TranslationMessage}}} must occur in {{{TranslationTemplateItem}}}.