Diff for "LEP/MailArchiver"

Not logged in - Log In / Register

Differences between revisions 1 and 2
Revision 1 as of 2012-01-04 22:49:27
Size: 3950
Editor: sinzui
Comment:
Revision 2 as of 2012-01-04 22:50:04
Size: 3954
Editor: sinzui
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#RST {{{
Line 103: Line 103:
}}}

MailArchive
===========

Launchpad uses MHonArch to archive emails and show them on the Web.
There are several problems:
    * The archiving is slow, it can take days for a message to appear in
      the web archive though the message was delivered in minutes
    * The web pages to not appear as Lp pages, nor do they behave as
      Lp web pages.
    * Private web archives are not accessible because of openid
      authentication issues

Mailman's internal archiver is pipermail. It maintains a canonical
representation of all messages in mbox format. It generates html
using templates. It supports monthly mbox archiving which reduces
the borden of generating pages for all the messages. Pipermail is
considered to be under-developeed and needs feature to support modern
needs. See http://wiki.list.org/display/DEV/ModernArchiving
The mailman config can be set to use the internal archiver, see
http://terri.zone12.com/doc/mailman/mailman-admin/node27.html

There are no mail archivers that meet Lp's needs. Most large scale hosters
write their own service or make extensive customisations to the mediocre
archives to meet their needs.


Launchpad needs
---------------

Lauchpad wants to access the archive using an API so that the mail messages
can be integrated with other Lp pages:
    * Messages arrive in the archive about the same time as they
      arrive in subscriber inboxes.
      * The message sent to subscribers might contain a perma-link to
        the message in the archive.
    * Show a page listing monthly date and thread indexes for the archive.
      * Presenting monthly slices of messages helps uers to locate
        messages and provides a sense of age.
      * The page might show a summary of the volume of messages per month
        to provide a sense of size.
      * The page might have a RSS feed link.
    * The index pages show the message subject, author, and date.
      * The page might show a summary of the volume of messages per week/day
    * The message page shows the message with linked content.
      * The page has navigation to see the sibling messages.
      * The message has a perma link.
      * Team admins can toggle message visibility
        (hide spam, abuse, user data, which currently requires a LOSA and
        a custom script).

Mailman requires a command or an ArchRunner class to send the message to.
mailan expects a exit/return code to know if the message is complete or
if it must be re-enqueued to try again. Mailman assumes that the archives
creates an archive as needed (it currenly passes a lot of data to the
mhnoarc command to ensure messages are added to the right archive). It
is easier to add an e


Diagram of interaction
----------------------

Mailman 
    .
    . <create-archive> <add-message>
    .
    v

mail-archive-command (Posix)
    |
    |
     -- mailarchivelib
    |
    |
Mail-Archive-Service (ReST/JSON)

    ^
    . <get-month-indexs> <get-date-index> <get-threat-index> <get-message>
    . <hide-message> <import-mbox*> <forward-message*>
    .
Launchpad

* actions are not essential


Design considerations
---------------------

* MBox is the standard for storing a collections of messages.
  * importing and exporting mbox format is a requirement, but it is
    not necessarilly the mechanism for mangaging indexes or servicing
    individual messages quickly.
  * A common strategy to ensure quick archiving is to create monthly
    mboxes for each list. This makes monthly and date presentations
    easy too. This complicates thread indexes since they might span
    many mboxes.
* ReST/JSON is desirable for webservice API because we could use
  AJAX to interact with it.
  * We do not intend to permit browsers ot have direct access to the data
    because we *think* we want to enhance the message data with
    links to real users.

LEP/MailArchiver (last edited 2012-01-09 11:35:32 by sinzui)