Re: Question about MIME and HTML Email

From: Byron C. Darrah <bdarr_at_sse.FU.HAC.COM_at_hypermail-project.org>
Date: Thu, 15 Oct 1998 11:40:12 -0700 (PDT)
Message-Id: <199810151840.LAA04691_at_pepperoni.pizza.hac.com>

> Date: Thu, 15 Oct 1998 20:13:43 +0200 (MET DST)
> From: Daniel Stenberg <Daniel.Stenberg_at_sth.frontec.se>
>
> On Thu, 15 Oct 1998, Byron C. Darrah wrote:
>
> > I think it would be highly appropriate for hypermail to be HTML-friendly
> > since, after all, it's purpose is to output web pages.
>
> I agree. We "must" make it work.
>
> > So a filter is needed that sanitizes HTML so that it can be directly
> > included within a hypermail message.
>
> Yes.

Cool, I'm glad Daniel and I agree on almost everything.

> > -- Strip out any unknown or undesirable HTML tags.
>
> Eh, no, it should keep unknown tags and removed undesirable ones. We can't
> possibly know what kind of tags they will introduce in the future.

Hmm, perhaps we should see what the others on this list think. I could seriously go either way on this. The reason I lean toward removing unknown tags is for the same reason: we can't possibly know what kind of tags they will introduce in the future. Thus, the safest move is to strip out unknowns.

I think the HTML (or SGML?) spec says that unrecognized tags should be ignored by a parser. So, if we leave them in, then they won't really be "ignored" since a web browser reading a hypermail page might try to use them for something. If we take them out, then the result at the browser level will be the same as ignoring them.

Another consequence arises because the filter should either try to close or eliminate unclosed containers. But the filter won't know which future tags are containers, and which aren't. So here again, the safest move is to remove unrecognized tags.

I invite further discussion.

> > -- Strip out content of <head> and <title> containers.
>
> Those are undesirable and should be removed.
>
> > -- Close any open containers (for eg, make sure that <table> is
> > always eventually followed by a matching </table>).
>
> <table> is indeed one of the more nasty things we can do to mess up a
> netscape page :-)
>
> > I actually have some code sitting around that would probably be easy for
> > me to modify to get it to do this. Kent, if you like, I could do this
> > and give you the result.
>
> I'd be happy to assist if there's a need for it.

Great! I'd like to at least hear from Kent before doing anything on this, but if he says "go" then I'll probably be able to use some help.

--Byron Darrah Received on Thu 15 Oct 1998 08:43:17 PM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:11 AM GMT GMT