Re: Ideas from Byron C. Darrah on 1998-05-05 (Hypermail Development List)

From: Byron C. Darrah <bdarr_at_sse.FU.HAC.COM_at_hypermail-project.org>
Date: Tue, 5 May 1998 18:45:19 -0700 (PDT)
Message-Id: <199805060145.SAA06101_at_pepperoni.pizza.hac.com>

> Date: Tue, 5 May 1998 17:40:13 -0700 (PDT)
> From: Jared Reisinger <feety_at_hhhh.org>
>
> On Mon, 4 May 1998, Byron C. Darrah wrote:
>
> > I don't know if I understand this right: Does this suggest replacing
> > 0000.html, 0001.html, etc with a single file? If so, then I think that
> > might be a mistake. It would require a CGI program to extract each page on
> > demand and turn it into HTML. Thus, the extraction/HTML conversion would
> > be done each time each message is read, instead of the current way which is
> > just once per message. It would mean a much higher system load for busy
> > archives.
>
> I agree that *busy* archives probably want to minimize request-time
> processing. But not-so-busy archives, or archives on beefy servers may
> not have to worry about it quite as much.
>
> Does anyone have any sense of how much overhead would be involved for
> on-the-fly generated pages using something like Hypermail's mail-to-HTML
> engine?

I admit the overhead might not be too bad if the message database is designed well. For one thing, the structure of such a system would need to distinguish message bodies from attachments, so that tons of unnecessary disk access is not done each time a message which happens to have a large attachment(s) is read. If that is done, and the messages in the archive aren't too big themselves (mail message bodies are usually pretty small), then on-the-fly retrieval/conversion could be very efficient - not much more overhead than if the messages were put into an "ar" archive and extracted on the fly.

However, is there really any good reason change over to this model? I don't really see any benefit, but I do see a couple of other disadvantages, in addition to the overhead that I mentioned before:

It would be less efficient to index such an archive with a standard web search engine. Most web search engines have a feature that lets you index files that reside on a local disk system with great efficiency using local disk accesses. However, with the monolithic system, any index/search program would have to access the entire message base one message at a time through the proposed on-the-fly hypermail CGI converter over a network connection.
Someday in the future, it would sure be nice to add an administrative tool that lets a site administrator delete / edit individual messages in a hypermail archive. Like (probably) most of you, I currently archive my hypermail messages redundantly in an (editable) mbox and rebuild the archive every time I need to do any such maintenance. But it would sure be great to do spot changes without having to rebuild.

If those messages are all bunched up together in a single file, this feature will be much harder to implement, and somewhat inefficient -- it will either have to leave "deleted" messages in the file (but marked off with a "tombstone"), or it will have to rebuild the whole file and index after deletions are done. And editing a message will require an even more sophisticated mechanism, since it could cause a message to grow in size.

> If the overhead is small enough compared to other stuff a web
> server has to do, it might be worth considering making the HTML generation
> into a request-time thing. At the very least, it could be an option, so
> that the list administrator could decide whether or not to pre-generate
> the HTML.

Yes, I for one would appreciate at least keeping an option available for not requiring on-the-fly processing.

> Part of the patch I sent to Kent was a change to make the remaining
> compile-time settings into config-file settings. It would be even cooler
> if Hypermail (or a request-time Hypermail CGI utility) could support these
> options at request time. This would allow *end-users* some control over
> how they view the archive.

This would be nice. But it can be done just as easily and efficiently if the messages reside in separate files, as compared to all stuck together in a single file.

> Plus, it helps make a distinction between
> Hypermail's core data set (the mail archive and associated indexes) and
> the rendered output.
>
> -- Jared

Thanks for reading, it's just MHO,
--Byron Darrah Received on Wed 06 May 1998 03:49:57 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:49 PM GMT GMT