Re: Searching hypermail

From: Paradise Cowgirl <minerva_at_phix.com_at_hypermail-project.org>
Date: Thu, 10 Sep 1998 16:07:31 -0700 (PDT)
Message-ID: <Pine.GSO.3.96.980910160520.2874B-100000_at_phix.com>

Hello, everyone,

Here's the snippet of code I mentioned previously; it's pretty brainless in that it assumes that words after "<b" or "<S" (the way way hypermail currently format "Next Message", etc) should not be indexed.

/* html2txt_hypermail.c -- a hack of the original html2txt.c

   for use with a hypermail archive. Skips material on    any given line following "<b" or "<S", which we assume means the    line in question is an "adminstrative" line in the    hypermail-generated HTML file. Avoids excessive    meaningless references when glimse is invoked. To use    compile with "gcc -o html2txt_hypermail html2txt_hypermail.c",    and edit .glimpse_filters in the relevant directory to    call html2txt_hypermail rather than html2txt.

   Allin Cottrell (cottrell_at_wfu.edu), December 1997

   June 5, 1998: added skip for lines beginning with "<s"    (for "<strong>"), for compatibility with new hypermail.    Untested!!!

*/

#include <stdio.h>

main()
{

        int c;

while(1) {

        c=getchar();
        if (c==EOF) exit(1);
        if (c != '<') putchar(c);
        else {
	        c=getchar();
		if (c==EOF) exit(1);
		if (c != 'b' && c != 'S') {
                	while (c != '>') {
                        	c=getchar();
                        	if (c==EOF) exit(1);
                	}

}
else { while (c != 10) { c=getchar(); if (c==EOF) exit(1); } putchar(10);
}
}

    }
}

Cheers,
-darci

On Thu, 10 Sep 1998, Robert J. Lebowitz wrote:

> I assume you're trying to exclude phrases, not just specific words???
> 
> No, I don't believe that Swish++ has this capability.  You can specify a
> list of stop words but I don't think that it can be set to identify phrases.
> 
> -----Original Message-----
> From: Allan Schaffer <allan_at_southpark.engr.sgi.com>
> To: hypermail_at_landfield.com <hypermail_at_landfield.com>
> Date: Thursday, September 10, 1998 3:49 PM
> Subject: Re: Searching hypermail
> 
> 
> >On Sep 10,  1:22pm, Robert J. Lebowitz wrote:
> >> Not true!!   There is a new product called Swish++ available at
> >> http://www.best.com/~pjl/
> >> Much better than the original and incredibly fast.
> >
> >Swish comes pretty close to suiting my needs too but (as
> >minerva_at_phix.com mentioned) there doesn't seem to be a way to
> >suppress the indexing of the words in "Next Message", "Previous
> >Message", etc.  Do you know if Swish++ has a way to get around this?
> >
> >Allan
> >
> >--
> >Allan Schaffer                                                allan_at_sgi.com
> >Silicon Graphics                               http://reality.sgi.com/allan
> 

--
information is not knowledge. knowledge is not
wisdom.  wisdom is not truth.  truth is not beauty.
beauty is not love. love is not music.
music is the best.   -- FZ
Received on Fri 11 Sep 1998 01:10:35 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:50 PM GMT GMT