Mail Archive

A services that archives mailing list emails and provides an API for other applications to retrieve the messages

Contact: Curtis (irc: sinzui)
On Launchpad: ml-archive-sucks bug tag

This is not a web/html archive. It is not a secondary means of subscribing to lists or forwarding messages (but future extensions could do this).

Rationale

1. Performance problems
Users frequently report that message take days to appear in the html archive.

2. Availability problems
Private archives are not accessible because of an openid authentication regression

3. Integration problems
The index and message pages do not look or behave like Lp app pages, which causes confusion.

We want to a fast and reliable means to store mailman messages, and to show those messages in Launchpad

Stakeholders

Canonical groups such as OEM, DX, UX, and U1 use private teams and mailing lists, but the archive is not visible. The stakeholder archive is such a list that cannot accessed to review previous discussion about this very issue.

High profile projects like openstack do not believe their messages are being delivered because they do not appear in the archive yet.

User stories

As a Mailman instance
I want messages archived quickly
so that I can keep the ArchRunner queue at zero
This could also be phrased as a sender to the list, I want to see my message in the archive to be certain it was forwarded to other users.

As a team member
I want the footer of the email to include a link to the message at Lp
so that I can refer other users to the message

As a team member
I want the message pages to include standard Lp links
so that I can navigate to users, bugs, and other areas of Lp

As a team admin
I want to use my existing MBoxes from other archivers
so that I can keep the list history

As a private team member
I want I want see the messages in the archive
so that I can review previous conversations

As a team admin
I want I want to hide messages in the archive from non-admins
so that spam, abuse, and user-data is not shown in web pages
LOSA use a custom script to do this on request of the users...mhonarc's own "delete" is not reliable.

Constraints and Requirements

Must

1. Integrate with Mailman's archiver mechanism.

2. Append new messages to the MBox quickly
MBox is the standard for storing messages. Users expect that format when importing or exporting their list data. MBox is not a fast format for managing date and thread views of the messages, or retrieving messages, which is a common performance issue with mailing list archivers.

3. Allow us to import the existing MBox data.

4. Permit exporting MBOXes

5. Provide a web service that permits Lp to:

  1. Get a list of months when messages were set to the list.
  2. Get a list of messages by date or thread for a month
  3. Get a message
  4. Allow the team admin to toggle message visibility

Nice to have

1. support a predicable id to store and retrieve messages by
Messages forwarded by mailman could include a link to where the message will be in the archive. Importing or hiding messages will not change the id used to retrieve the message.

2. Use ReST/JSON as the webservice protocol and format.

3. Provide data to show the volume of messages per month, week, and day.

Must not

1. Delay archiving a message from mailman to do secondary work.

2. server data/pages directly to users

Out of scope

1. Import an MBox from the Lp webapp

2. Forward a message to a user as if he was subscribed

3. Provide a feed of the latests messages in the archive

Subfeatures

1. Provide a library to manages how the commands/features work with the archive data, indexes, and messages.

2. Provide a command line tool for mailman and admins to work with the archive.

3. Provide a web service that Lp can Integrate with.

Success

How will we know when we are done?

1. Users can see list message in the Lp app with bug and people linked. eg. https://launchpad.net/~launchpad-dev/+mailing-list-archive/+message/nnn

2. Private teams can see the messages sent to their list.

3. list emails include a perma link to the message at Lp in the footer.

How will we measure how well we have done?

1. The Mailman ArchRunner queue will have less than 10 messages at any one time.

2. A message sent to a list is accessible in Launchpad within a minute of it arriving in the archive.

3. Members of teams with large lists can find a message in less than two minutes if they know the subject and the date +/- 1 day.

Thoughts?

background

Mailman's internal archiver is Pipermail. It maintains a canonical representation of all messages in mbox format. It generates html using templates. It supports monthly mbox archiving which reduces the burden of generating pages for all the messages. Pipermail is considered to be under-developed and needs feature to support modern needs. See http://wiki.list.org/display/DEV/ModernArchiving The mailman config can be set to use the internal archiver, see http://terri.zone12.com/doc/mailman/mailman-admin/node27.html

There are no mail archivers that meet Lp's needs. Most large scale hosters write their own service or make extensive customisations to the mediocre archives to meet their needs.

* MBox is the standard for storing a collections of messages.

* ReST/JSON is desirable for webservice API because we could use

Diagram of interaction

Mailman 
    .
    . <create-archive> <add-message>
    .
    v

mail-archive-command (Posix)
    |
    |
     -- mailarchivelib
    |
    |
Mail-Archive-Service (ReST/JSON)

    ^
    . <get-month-indexs> <get-date-index> <get-threat-index> <get-message>
    . <hide-message> <import-mbox*> <forward-message*>
    .
Launchpad

* actions are not essential

LEP/MailArchiver (last edited 2012-01-09 11:35:32 by sinzui)