[hypermail] Re: performance

From: Scott Rose <srose_at_direct.ca_at_hypermail-project.org>
Date: Tue, 13 Mar 2001 23:26:31 -0800
Message-ID: <3AAF1D27.F218B2C8_at_direct.ca>


"Peter C. McCluskey" wrote:

> srose_at_direct.ca (Scott Rose) writes:
> >On an entirely different note, I have code that improves the performance
> >of hypermail, particularly in the case of large archives- hypermail
> >opens each file in an archive to build new indices each time a message
> >arrives when you run in message-at-a-time mode, and it's desperately
> >expensive. My approach uses a GDBM index so that a whole lot less I/O
> >has to take place. I've been waiting for 2.0 to ship before bringing
> >this up again... I mentioned it to Kent a year or so ago. Any interest?
> >It should be generalized beyond GDBM to be most widely useful...
>
> Could we see this code?

No, but I wanted you to know that I had it.

Just kidding! I have a version of a late 2.0 beta that has this stuff in place, but not a version of 2.0. I could either point you to a source tarball of that, or build it into 2.0 and point you to that, but it would happen a little later in the latter case.

I also found that there was room for dramatic improvements in 1.02 that were unrelated to I/O, which I fixed I *think* only for my own local version- but I either found that the most egregious case was already repaired in 2.0 beta, or failed to look hard enough. There was one function that was called N^2 times that only needed to be called once. But I digress. I think that the I/O proportion is a strong function of how the message store is accessed- if, God forbid, it's NFS, there is room for a big win. Less so if it's on a local disk, which is where the message store ought to be, we can agree.

Checking just now, I found my notes about my most recent tests of the performance of my DBM hack. To do the test, I created a tool (called "hyperfeed") that would pass one message at a time from an mbox to unique invocations of hypermailrunning  hypermail once in mbox-at-a-time mode isn't a good test of the performance for the case where, like I run all my archives, the messages are archived as they arrive. I did this test with a 750-message mbox, on a local ext2 file system message store (Linux), back in October, 1999. I used GDBM as the dbm package, which is regrettably all my code supports. When run with my -g switch to enable the use of the DBM index, it took 78 seconds to complete. Without, 450 seconds. I think that qualifies as significant. I used the same hypermail binary for both runs, the same file system, the same mbox, the same clock... Received on Wed 14 Mar 2001 09:29:30 AM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:12 AM GMT GMT