[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] reinventing the header removal wheel

Grant McLean <grantm@web.co.nz> wrote:
> > perl -ne 'print if ($B or !/\S/ and $B++ )' < a mail message
> and if you s/\$B/$b/g it will even run under use strict :-)
>   perl -Mstrict -ne 'print if ($b or !/\S/ and $b++ )' < a mail message

I usually prefer to use the much-neglected scalar .. flip-flop operator:

  perl -ne 'print unless 1../^$/' < message

<mail protocol geekery>

/^$/ is more right than !/\S/ (or /^\s*$/) for detecting the blank
line separating header from body in an RFC822 format message.  The
blank line is supposed to be *only* a line terminator, white space is
not allowed.

Unfortunately, there has been software that either generated messages
white white space in the header/body separator, or inserted it while
the message is in transit.  However, this is pretty rare, especially
now.  Also unfortunately, the header folding rules in 822 allow for
arbitrarily folding with extra whitespace-only lines in the middle of
a header line.

I brought this up on the IETF DRUMS list a couple of years ago.  I
wanted 822's replacement to say that a blank line containing white
space should be parsed as the header/body separator.  I ran a search
of my archive of tens of thousands of messages and did find a very few
(less than ten) messages that actually had whitespace-only lines in
the middle of folded headers - they were all Received: lines generated
by Smail.

I argued that the damage caused by mis-parsing part of the body as
header (part of the body may then be hidden or dropped and never seen
by the recipient) is worse than the damage that may be caused by
mis-parsing part of the header as body (mail filter or client may file
the message in the wrong place or display incorrect summary
information, but the recipient will see the missing header information
when they read the message and realize what happened).  However, I
wasn't able to get consensus on that point.  The RFC822 replacement
will still say that only a truly blank line, with no white space, be
parsed as the header/body separator.  However, we *did* add a new rule
that forbids introducing whitespace-only lines when folding header
lines, something that was very rare to begin with.

So, this isn't really important, and is not likely to fail whichever
way you do it.  But if you want to stick to the spec, use /^$/.

</mail protocol geekery>

  --  Cos (Ofer Inbar)  --  cos@polyamory.org  cos@wbrs.org
  --  Exodus Professional Services  --  cos@ne.cohesive.com
 "This may seem a bit weird, but that's okay, because it is weird."
    -- Larry Wall in perlref(1) man page, Perl 5.001

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe