LSTOWN-L Archives

LISTSERV List Owners' Forum

LSTOWN-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Hal Keen <[log in to unmask]>
Sun, 6 May 2007 18:24:53 -0500
text/plain (107 lines)
Kathy,

Sorry about the delay. This set of comments took a while because I had to do
some experiments and some documentation searches.

> You seem to
be saying that it's the way a particular email client renders
content. If that's the case though why are do about 10% of people's
messages have these = characters and not all or none of them?

That depends on whether the receiving email client recognizes both the MIME
description and the format and translates them. And when any kind of
transfer (such as cut and paste) occurs, if at any stage something does not
translate, then the QP formatting interjections remain visible from then on,
because they can all be conveyed as plain text (necessarily, because they
are designed to be conveyable via email).

>Curiously the characters don't appear with the = coding on the
archive. 4'33? gets rendered as 4′33″ in the archive.

When I attempted to save this, the characters in that quote caused me to be
presented with a choice: send "as unicode" (requiring unicode support to
read) or send "as is" (and lose the 8-bit characters). These are choices
between two Content-Type settings, both claiming to be text/plain, but with
charset="UTF-8" for unicode and charset="Windows-1252" otherwise. Either
way, my client used a transfer encoding of 8bit, which can result in
characters changing between systems. I don't know if Unicode can prevent
those changes.

I have been trying to figure out the subsequent steps. The expansion
continues if you retransmit those expressions, and it appears to be due to
the fact that UTF-8 consistently generates codes that require further
encoding to retransmit them. What makes the difference is the setting I use
to view the received file.

I'll bet we're not all reading those strings the same, either.

>It seemed to make sense that it was the Content-Transfer-Encoding:
field that was not converting the characters and adding =20 for line
feeds, but then I saw the messages that appear this way on my
apparently utf-8 Apple Mail.app also appeared with the = characters
on my Palm Treo, so I suppose that's utf-8 as well.

By the way, =20 is not a line feed but a space character. QP adds those
whenever the end-of-line has spaces in front of it, in an attempt to keep
them from being lost in translation.

The ability to handle UTF-8 content and the ability to handle
quoted-printable tranfer encoding should not be mutually exclusive. However,
if you're doing both, getting the translations in the right order might be
tricky, because UTF-8 uses multiple-octet sequences to represent characters
in the higher ranges and QP is only designed to encode one octet at a time.

Notes on standards: Quoted-printable is described in RFC 2045, one of the
basic MIME standards-track RFCs. UTF-8 is currently Internet standard 63,
RFC 3629, after having gone through multiple RFC versions. Supposedly, it is
consistent with ISO 10646, which defines a Universal Character Set (UCS)
which allegedly matches the Unicode standard.

>Apple Mail uses Automatic encoding
<http://docs.info.apple.com/article.html?path=Dotmac/Mail/en/mac70.html>
>
>What seems to be happening is that when someone using an email
application that does "automatic encoding" encounters a utf-8
character, it changes to that content-type. This can be done with cut
and paste (as with my 4'33' example) or by replying to an email that
is already encoded in utf-8.
>
>Accents will trigger the utf-8 encoding as well.

That makes sense; they use characters beyond the 7-bit ASCII set.

The really strange part is using 8bit as the default transfer for UTF-8,
which automatically encodes any character requiring more than 7 bits as a
string of octets ALL of which have values above the 7-bit range. Base64
would seem to be the safer choice, although it would tend to expand things
even more.

That's not a snotty remark about your email client. My own isn't that
clever, either; it gives me a choice of sending plain text as unencoded, QP,
base64, or (non-MIME) uuencoded, but even if it gives me the warning about
out-of-range data and offers me the chance to switch to Unicode (transferred
in UTF-8), it's not smart enough to change to base64 too. Let these things
do the thinking for us, and it takes a research project to figure out what
we'll get!

>I'll send a small email after this without any accents, extended
characters or pastes that will show Content-Transfer-Encoding to be
7bit.

It did.

>It seems there should be a way for listserv to overlook 8 bit
characters. Perhaps that is what "Translate = No" will do.

It's not clear to me what specific control characters are removed using
Translate=No, but there is some background on the archive of this list
suggesting that it is needed to allow some double-octet (i.e., Unicode)
character sets through. However, it is stated that 8-bit characters are not
the ones affected.

I think this URL will get you to the same search I used:
http://peach.ease.lsoft.com/scripts/wa.exe?S2=LSTOWN-L&q=Translate%3D&s=&f=&a=&b=
(Note that it ENDS with an equals sign!)

Hal

ATOM RSS1 RSS2