Diff for "LEP/MailArchiver"

Not logged in - Log In / Register

Differences between revisions 2 and 3
Revision 2 as of 2012-01-04 22:50:04
Size: 3954
Editor: sinzui
Comment:
Revision 3 as of 2012-01-05 21:51:12
Size: 7000
Editor: sinzui
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
{{{

MailArchive
===========

Launchpad uses MHonArch to archive emails and show them on the Web.
There are several problems:
    * The archiving is slow, it can take days for a message to appear in
      the web archive though the message was delivered in minutes
    * The web pages to not appear as Lp pages, nor do they behave as
      Lp web pages.
    * Private web archives are not accessible because of openid
      authentication issues

Mailman's internal archiver is pipermail. It maintains a canonical
= Mail Archive =

A services that archives mailing list emails and provides an API for
other applications to retrieve the messages

'''Contact:''' Curtis (irc: sinzui) <<BR>>
'''On Launchpad:''' [[https://bugs.launchpad.net/launchpad/+bugs?field.tag=ml-archive-sucks|ml-archive-sucks bug tag]]

This is not a web/html archive. It is not a secondary means of subscribing
to lists or forwarding messages (but future extensions could do this).


== Rationale ==


1. Performance problems<<BR>>
Users frequently report that message take days to appear in the html archive.

2. Availability problems<<BR>>
Private archives are not accessible because of an openid authentication
regression

3. Integration problems<<BR>>
The index and message pages do not look or behave like Lp app pages, which
causes confusion.

We want to a fast and reliable means to store mailman messages, and to
show those messages in Launchpad

== Stakeholders ==


Canonical groups such as OEM, DX, UX, and U1 use private teams and mailing
lists, but the archive is not visible. The stakeholder archive is such
a list that cannot accessed to review previous discussion about this very
issue.

High profile projects like openstack do not believe their messages are being
delivered because they do not appear in the archive yet.


== User stories ==

<<Anchor(story-name>)>>

'''As a ''' Mailman instance<<BR>>
'''I want ''' messages archived quickly<<BR>>
'''so that ''' I can keep the ArchRunner queue at zero<<BR>>
This could also be phrased as a sender to the list, I want to see my message
in the archive to be certain it was forwarded to other users.

'''As a ''' team member<<BR>>
'''I want ''' the footer of the email to include a link to the message at Lp<<BR>>
'''so that ''' I can refer other users to the message<<BR>>

'''As a ''' team member<<BR>>
'''I want ''' the message pages to include standard Lp links<<BR>>
'''so that ''' I can navigate to users, bugs, and other areas of Lp<<BR>>

'''As a ''' team admin<<BR>>
'''I want ''' to use my existing MBoxes from other archivers<<BR>>
'''so that ''' I can keep the list history<<BR>>

'''As a ''' private team member<<BR>>
'''I want ''' I want see the messages in the archive<<BR>>
'''so that ''' I can review previous conversations<<BR>>

'''As a ''' team admin<<BR>>
'''I want ''' I want to hide messages in the archive from non-admins<<BR>>
'''so that ''' spam, abuse, and user-data is not shown in web pages<<BR>>
LOSA use a custom script to do this on request of the users...mhonarc's
own "delete" is not reliable.



== Constraints and Requirements ==

=== Must ===

1. Integrate with Mailman's archiver mechanism.

2. Append new messages to the MBox quickly<<BR>>
MBox is the standard for storing messages. Users expect that format when
importing or exporting their list data. MBox is not a fast format for
managing date and thread views of the messages, or retrieving messages,
which is a common performance issue with mailing list archivers.

3. Allow us to import the existing MBox data.

4. Provide a web service that permits Lp to:
    1. Get a list of months when messages were set to the list.
    2. Get a list of messages by date or thread for a month
    3. Get a message
    4. Allow the team admin to toggle message visibility


=== Nice to have ===

1. support a predicable id to store and retrieve messages by<<BR>>
Messages forwarded by mailman could include a link to where the message
will be in the archive. Importing or hiding messages will not change
the id used to retrieve the message.

2. Use ReST/JSON as the webservice protocol and format.

3. Provide data to show the volume of messages per month, week, and day.


=== Must not ===

1. Delay archiving a message from mailman to do secondary work.

2. server data/pages directly to users


=== Out of scope ===

1. Import an MBox from the Lp webapp

2. Forward a message to a user as if he was subscribed

3. Provide a feed of the latests messages in the archive

== Subfeatures ==

1. Provide a library to manages how the commands/features work with the
archive data, indexes, and messages.

2. Provide a command line tool for mailman and admins to work with the
archive.

3. Provide a web service that Lp can Integrate with.


== Success ==

=== How will we know when we are done? ===

1. Users can see list message in the Lp app with bug and people linked.
eg. https://launchpad.net/~launchpad-dev/+mailing-list-archive/+message/nnn

2. Private teams can see the messages sent to their list.

3. list emails include a perma link to the message at Lp in the footer.


=== How will we measure how well we have done? ===

1. The Mailman ArchRunner queue will have less than 10 messages at any one
time.

2. A message sent to a list is accessible in Launchpad within a minute of it
arriving in the archive.

3. Members of teams with large lists can find a message in less than
two minutes if they know the subject and the date +/- 1 day.


== Thoughts? ==

=== background ===

Mailman's internal archiver is Pipermail. It maintains a canonical
Line 18: Line 166:
the borden of generating pages for all the messages. Pipermail is
considered to be under-developeed and needs feature to support modern
the burden of generating pages for all the messages. Pipermail is
considered to be under-developed and needs feature to support modern
Line 28: Line 176:

Launchpad needs
---------------

Lauchpad wants to access the archive using an API so that the mail messages
can be integrated with other Lp pages:
    * Messages arrive in the archive about the same time as they
      arrive in subscriber inboxes.
      * The message sent to subscribers might contain a perma-link to
        the message in the archive.
    * Show a page listing monthly date and thread indexes for the archive.
      * Presenting monthly slices of messages helps uers to locate
        messages and provides a sense of age.
      * The page might show a summary of the volume of messages per month
        to provide a sense of size.
      * The page might have a RSS feed link.
    * The index pages show the message subject, author, and date.
      * The page might show a summary of the volume of messages per week/day
    * The message page shows the message with linked content.
      * The page has navigation to see the sibling messages.
      * The message has a perma link.
      * Team admins can toggle message visibility
        (hide spam, abuse, user data, which currently requires a LOSA and
        a custom script).

Mailman requires a command or an ArchRunner class to send the message to.
mailan expects a exit/return code to know if the message is complete or
if it must be re-enqueued to try again. Mailman assumes that the archives
creates an archive as needed (it currenly passes a lot of data to the
mhnoarc command to ensure messages are added to the right archive). It
is easier to add an e


Diagram of interaction
----------------------
* MBox is the standard for storing a collections of messages.
  * importing and exporting mbox format is a requirement, but it is
  not necessarilly the mechanism for mangaging indexes or servicing
  individual messages quickly.
  * A common strategy to ensure quick archiving is to create monthly
  mboxes for each list. This makes monthly and date presentations
  easy too. This complicates thread indexes since they might span
  many mboxes.
* ReST/JSON is desirable for webservice API because we could use
  AJAX to interact with it.
  * We do not intend to permit browsers ot have direct access to the data
  because we *think* we want to enhance the message data with
  links to real users.

=== Diagram of interaction ===

{{{
Line 85: Line 214:


Design considerations
---------------------

* MBox is the standard for storing a collections of messages.
  * importing and exporting mbox format is a requirement, but it is
    not necessarilly the mechanism for mangaging indexes or servicing
    individual messages quickly.
  * A common strategy to ensure quick archiving is to create monthly
    mboxes for each list. This makes monthly and date presentations
    easy too. This complicates thread indexes since they might span
    many mboxes.
* ReST/JSON is desirable for webservice API because we could use
  AJAX to interact with it.
  * We do not intend to permit browsers ot have direct access to the data
    because we *think* we want to enhance the message data with
    links to real users.

Mail Archive

A services that archives mailing list emails and provides an API for other applications to retrieve the messages

Contact: Curtis (irc: sinzui)
On Launchpad: ml-archive-sucks bug tag

This is not a web/html archive. It is not a secondary means of subscribing to lists or forwarding messages (but future extensions could do this).

Rationale

1. Performance problems
Users frequently report that message take days to appear in the html archive.

2. Availability problems
Private archives are not accessible because of an openid authentication regression

3. Integration problems
The index and message pages do not look or behave like Lp app pages, which causes confusion.

We want to a fast and reliable means to store mailman messages, and to show those messages in Launchpad

Stakeholders

Canonical groups such as OEM, DX, UX, and U1 use private teams and mailing lists, but the archive is not visible. The stakeholder archive is such a list that cannot accessed to review previous discussion about this very issue.

High profile projects like openstack do not believe their messages are being delivered because they do not appear in the archive yet.

User stories

As a Mailman instance
I want messages archived quickly
so that I can keep the ArchRunner queue at zero
This could also be phrased as a sender to the list, I want to see my message in the archive to be certain it was forwarded to other users.

As a team member
I want the footer of the email to include a link to the message at Lp
so that I can refer other users to the message

As a team member
I want the message pages to include standard Lp links
so that I can navigate to users, bugs, and other areas of Lp

As a team admin
I want to use my existing MBoxes from other archivers
so that I can keep the list history

As a private team member
I want I want see the messages in the archive
so that I can review previous conversations

As a team admin
I want I want to hide messages in the archive from non-admins
so that spam, abuse, and user-data is not shown in web pages
LOSA use a custom script to do this on request of the users...mhonarc's own "delete" is not reliable.

Constraints and Requirements

Must

1. Integrate with Mailman's archiver mechanism.

2. Append new messages to the MBox quickly
MBox is the standard for storing messages. Users expect that format when importing or exporting their list data. MBox is not a fast format for managing date and thread views of the messages, or retrieving messages, which is a common performance issue with mailing list archivers.

3. Allow us to import the existing MBox data.

4. Provide a web service that permits Lp to:

  1. Get a list of months when messages were set to the list.
  2. Get a list of messages by date or thread for a month
  3. Get a message
  4. Allow the team admin to toggle message visibility

Nice to have

1. support a predicable id to store and retrieve messages by
Messages forwarded by mailman could include a link to where the message will be in the archive. Importing or hiding messages will not change the id used to retrieve the message.

2. Use ReST/JSON as the webservice protocol and format.

3. Provide data to show the volume of messages per month, week, and day.

Must not

1. Delay archiving a message from mailman to do secondary work.

2. server data/pages directly to users

Out of scope

1. Import an MBox from the Lp webapp

2. Forward a message to a user as if he was subscribed

3. Provide a feed of the latests messages in the archive

Subfeatures

1. Provide a library to manages how the commands/features work with the archive data, indexes, and messages.

2. Provide a command line tool for mailman and admins to work with the archive.

3. Provide a web service that Lp can Integrate with.

Success

How will we know when we are done?

1. Users can see list message in the Lp app with bug and people linked. eg. https://launchpad.net/~launchpad-dev/+mailing-list-archive/+message/nnn

2. Private teams can see the messages sent to their list.

3. list emails include a perma link to the message at Lp in the footer.

How will we measure how well we have done?

1. The Mailman ArchRunner queue will have less than 10 messages at any one time.

2. A message sent to a list is accessible in Launchpad within a minute of it arriving in the archive.

3. Members of teams with large lists can find a message in less than two minutes if they know the subject and the date +/- 1 day.

Thoughts?

background

Mailman's internal archiver is Pipermail. It maintains a canonical representation of all messages in mbox format. It generates html using templates. It supports monthly mbox archiving which reduces the burden of generating pages for all the messages. Pipermail is considered to be under-developed and needs feature to support modern needs. See http://wiki.list.org/display/DEV/ModernArchiving The mailman config can be set to use the internal archiver, see http://terri.zone12.com/doc/mailman/mailman-admin/node27.html

There are no mail archivers that meet Lp's needs. Most large scale hosters write their own service or make extensive customisations to the mediocre archives to meet their needs.

* MBox is the standard for storing a collections of messages.

  • importing and exporting mbox format is a requirement, but it is not necessarilly the mechanism for mangaging indexes or servicing individual messages quickly.
  • A common strategy to ensure quick archiving is to create monthly mboxes for each list. This makes monthly and date presentations easy too. This complicates thread indexes since they might span many mboxes.

* ReST/JSON is desirable for webservice API because we could use

  • AJAX to interact with it.
  • We do not intend to permit browsers ot have direct access to the data because we *think* we want to enhance the message data with links to real users.

Diagram of interaction

Mailman 
    .
    . <create-archive> <add-message>
    .
    v

mail-archive-command (Posix)
    |
    |
     -- mailarchivelib
    |
    |
Mail-Archive-Service (ReST/JSON)

    ^
    . <get-month-indexs> <get-date-index> <get-threat-index> <get-message>
    . <hide-message> <import-mbox*> <forward-message*>
    .
Launchpad

* actions are not essential

LEP/MailArchiver (last edited 2012-01-09 11:35:32 by sinzui)