LSTOWN-L Archives

LISTSERV List Owners' Forum

LSTOWN-L

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Hal Keen <[log in to unmask]>
Thu, 6 Sep 2007 11:40:39 -0500
text/plain (105 lines)
Donna Grant and I have been comparing notes on our problems with HTML and attachments sent to lists
from Lotus Notes. Below my signature is an example of the MIME headers and boundary markers for a
problem message. (This particular case was Donna's; I have replaced the message content with
descriptive notes in brackets.)

Both cases were observed with LISTSERV version 14.5. In both cases, a post with an attachment was
sent to the list from Lotus Notes. Because Notes (whether by design or by configuration) generates
mail in HTML, the message is sent with MIME Content-Type of multipart/mixed, with separate parts for
the message text and the attachment, and the message text is itself a composite of the
multipart/alternative type, conveying both plaintext and HTML. This requires two sets of MIME
boundary delimiters, one for the whole message and one for the alternative portion.

That is, the MIME structure in outline form would be:
I. Whole message (multipart/mixed)
  A. Message text (multipart/alternative)
    1. Message text (plain)
    2. Message text (HTML)
  B. Attachment (base64)

One boundary delimiter is defined to separate parts A and B, and another to separate A's parts 1 and
2.

After checking it against with RFC 2046 (MIME Part Two: Media Types), I believe the MIME encoding to
be valid. Yet Donna and I have seen two different misbehaviors when it is processed by LISTSERV:

(1) On Donna's list, the message with attachment got distributed despite Attachments= No in the list
configuration. (The attachment sent to my list was a permitted type.) Donna has just told me there
have been more cases of that, all sent using Lotus Notes.

(2) On both lists, the HTML part was not stripped despite Language= NOHTML in the configuration.

This appears to be some kind of bug in LISTSERV, and its nature is not obvious. We may need some
help from those able to examine the code.

My first suspicion was that the multi-layered MIME structure was confusing LISTSERV and preventing
it from recognizing parts that were nested two levels deep. However, that would not account for
Donna's case, where the attachment (just one level deep) was not blocked.

Donna's problem could be explained by a parsing algorithm that lost track of all but the last
boundary delimiter seen, and failed to recognize the attachment because, by the time it processed
that part of the post, it was looking only for the second-layer delimiter defined for the
multipart/alternative portion. If that were the case, however, one would expect the HTML to be
stripped successfully.

Besides, it seems very likely that the problem is confined to a set of narrow conditions. Other
email clients generate messages with the same nested structure. I easily created a sample, for
instance, in Outlook Express.

The whole problem might actually hinge on a peculiarity of the boundary delimiters generated by
Lotus Notes. As shown in the sample below, it generates delimiters containing white space. This is
specifically permitted by RFC 2046; section 5.1.1 includes an example with white space and
complicates the BNF definitions explicitly to allow it. However, most email clients, e.g. Outlook
Express in my trial case, avoid doing so, probably because a couple of requirements in that section
combine to make white space in a delimiter more likely to cause problems. These are:

   Boundary string comparisons must compare the
   boundary value with the beginning of each candidate line.  An exact
   match of the entire candidate line is not required; it is sufficient
   that the boundary appear in its entirety following the CRLF.

   If a boundary delimiter line appears to end with white space,
   the white space must be presumed to have been
   added by a gateway, and must be deleted.

In light of these requirements, if LISTSERV has problems handling the rare case where a MIME
multipart boundary delimiter contains white space, that would be entirely understandable. It's a
complicated definition and could require a tricky design to cover all cases.

But could somebody please find out if that's the problem? With our configurations, HTML and
prohibited attachments should not be getting through our lists.

Hal Keen

=== sample MIME portion of a Lotus Notes message, with attachment ===

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_mixed 0058420A8025730C_="

--=_mixed 0058420A8025730C_=
Content-Type: multipart/alternative; boundary="=_alternative 0058420A8025730C_="


--=_alternative 0058420A8025730C_=
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

[ plain text content snipped ]


--=_alternative 0058420A8025730C_=
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable


[ HTML content snipped ]
--=_alternative 0058420A8025730C_=--
--=_mixed 0058420A8025730C_=
Content-Type: application/octet-stream; name="test.doc"
Content-Disposition: attachment; filename="test.doc"
Content-Transfer-Encoding: base64

[ base64 content snipped ]

--=_mixed 0058420A8025730C_=--

ATOM RSS1 RSS2