Re: [hypermail] ultimate searchable archives

From: Bill Moseley <moseley_at_hank.org_at_hypermail-project.org>
Date: Mon, 16 Sep 2002 21:13:21 -0700
Message-Id: <3.0.3.32.20020916211321.02a60760_at_pop3.hank.org>


At 01:24 PM 09/06/02 -0700, Bill Paxton wrote:
>Are there some pre-done htdig modifications out there?
>I checked contrib but nothing I could find. Is there
>something better than htdig?

I'm one of the developers of swish-e (http://swish-e.org). I've used it for indexing hypermail archives -- there's a perl script in the swish-e distribution that I have used for parsing the metadata form the hypermail HTML messages.

The downfall is that swish doesn't do incremental indexing, so for a very high volume list it might be a problem. On the other hand, swish is so damn fast[1] at indexing that for most application you don't need incremental indexing. If your messages are not coming in every second or so then you can typically figure out a way to build an index quickly (i.e. have master index created once a day and run indexing on just new messages for the day every minute or so and search both indexes at the same time).

The swish-e list is a hypermail archive and it's searchable at

   http://swish-e.org/Discussion/search/swish.cgi

You could probably come up with a better looking interface.

[1] Fast is subjective, of course. On my athlon I can index 100,000 2K text files in about three minutes. YMMV, of course.

-- 
Bill Moseley
mailto:moseley_at_hank.org
Received on Fri 20 Sep 2002 12:46:44 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:54 PM GMT GMT