Re: [hypermail] Latin1 subject with UTF-8 body

From: Zvi Har'El <rl_at_math.technion.ac.il_at_hypermail-project.org>
Date: Wed, 18 Jun 2003 11:07:17 +0300
Message-ID: <20030618080717.GB24085_at_fermat.math.technion.ac.il>


Dear Hypermail developers,

I found out that the same problem happens with the From: header: I just got a message to the forum from a friend in Eastern Europe, who uses the iso-8858-2 encoding. He also spells his name in this encoding. The name appears correctly on the message page, since the message and the name was encoded similarly. However, the index page is encoded in iso-8859-1, and the name bytes are translated into funny characters! I really think a satisfactory solution should be found (e.g, printing the headers in pure ascii using decimal character entities) if we wish to suport i18n in mailing lists.

Best,

Zvi.

On Mon, 07 Apr 2003 16:23:18 +0300, Zvi Har'El wrote about "[hypermail] Latin1 subject with UTF-8 body":
> Dear Hypermail Developers,
>
> I am using hypermail for archiving my mailing list, the "Jules Verne Forum", at
> <http://JV.Gilead.org.il/forum/>. Yesterday, I sent a mail message to the
> list, which is composed in English with few French words. In particular, the
> subject line contained French accented characters. My mailer, mutt 1.4, is
> configured to send iso-8859-1 if it can, utf-8 otherwise. In the body of the
> message, I had a quoted French expression, and I hastily decided to use the
> Unicode non-ascii single quotes (U+2018 and U+2019)instead of the ascii single
> quote (U+0027). Therefore, the body of the message was sent in utf-8, not
> iso-8859-1. So, the headers looked as follows:
>
> Subject: New mailing address for =?iso-8859-1?Q?the?=
> =?iso-8859-1?Q?_Soci=E9t=E9?= Jules Verne
> Message-ID: <20030406194902.GB28158_at_fermat.math.technion.ac.il>
> Mime-Version: 1.0
> Content-Type: text/plain; charset=utf-8
>
> ....
>
> Now here is the problem: Although the mail is completely ok, and the index
> page, which is generated in iso-8859-1, is ok, there was a problem, with the
> message page, which was generated in utf-8. The <title> and <h1> tags of this
> page contain the subject, and is expressed in iso-8859-1 characters, and
> not in the corresponding utf-8 characters (the utf-8 representation of ascii
> characters is the identity, however for non-ascii, such as the accented french
> characters, it is not). You can see the index file in
> <http://JV.Gilead.org.il/forum/2003/04/> and the message file in
> <http://JV.Gilead.org.il/forum/2003/04/0011.html>
>
> My suggestion is the following: since rfc 2822 dictates the message subject to
> be encoded in ascii, independantly of the mime type of the body, it is
> impossible to store a correct subject in the html file unless it is encoded in
> ascii, i.e., raw html entities. For example, translate =?iso-8859-1?=E9=, which
> is the e-acute character, to its entity equivalent, &#xe9; (in hexadecimal) or
> &#233; (in decimal). Since from programming point of view the forms are
> equivalent, the latter is perhaps better since older browsers may not recognize
> the former. Therefore, the subject of the mail I have above should be
> translated to the ascii string
> New mailing address for the Soci&#xe9;t&#xe9; Jules Verne
> or
> New mailing address for the Soci&#233;t&#233; Jules Verne
> And not to a iso-8859-1 string
> New mailing address for the Société Jules Verne
> as it is currently tranlated.
>
> I still haven't looked how this should be implemented in code but I hope it
> should not be hard.
>
> Best,
>
> Zvi.
>
> --
> Dr. Zvi Har'El mailto:rl_at_math.technion.ac.il Department of Mathematics
> tel:+972-54-227607 icq:179294841 Technion - Israel Institute of Technology
> fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL
> "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
> Monday, 5 Nisan 5763, 7 April 2003, 3:53PM

-- 
Dr. Zvi Har'El     mailto:rl_at_math.technion.ac.il     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                               Wednesday, 18 Sivan 5763, 18 June 2003, 11:00AM
Received on Wed 18 Jun 2003 09:10:10 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:54 PM GMT GMT