[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Odd behavior on die()



"Patton, Paul B (MN10)" <Paul.B.Patton@HBC.honeywell.com> writes:
>Re Paul Schinder's reply to Doug Roberts:
>
>>}Secondly, you may notice that this print routine is missing the
>>}"Content-type: text/html\n\n" line. It seems that if the file open
>>
>>Alarm bells just went off.  HTTP requires a certain line end for
>>Content-type:.  I don't remember exactly what it is, but it's either
>>\012\012 or \015\012\015\012.  Note that you cannot *portably* write
>>these any other way in Perl, no combination of \r and \n will be
>>correct on all platforms.
>
>The book "HTML & CGI Unleashed" emphasizes that all CGI scripts
>should ensure that the two bytes following the last printable byte of
>a "Content-type..." line are \012\012 to ensure compatibility.

I would use a stronger word than compatibility: Internet standards* specify
that the byte stream appearing on the network MUST be ASCII CR LF. As far
as I'm aware that applies generally for all text based protocols such as
telnet, ftp, smtp, pop3, imap, etc. As I'm sure Paul is only too aware from
unnecessary effort in porting CPAN modules, many of those modules make no
distinction between the host and network character sets. Unfortunately this
sloppy coding practice escapes attention because most perl people are
working on systems which map the C-implementation-dependent \r\n escape
sequences to ASCII \015\012. As Paul mentioned the only way to portably
write these characters in C or Perl is with octal/hex encoding which is
what all perl code dealing with network-ASCII should use. I guess it goes
back to teletypes and computers not having a standard way to mark
end-of-line.

I had a brief look at converting one CPAN module I needed, and got lost
which uses of '\n' were for strings going to the network vs strings going
to host files/display. I got it sort of working without too much trouble,
but I only used a few parts of the code and certainly wasn't confident with
my fixes. Other people are probably much better than me at intuiting what
code does by inspection, but I still think fixing up '\n's is tricky.

IMPORTANT QUESTION
Is there some way we can encourage the CPAN maintainers to prefer network
modules making use of \015\012 over \r\n ? As mentioned above, it's not
just a matter of changing all \n's to \012, and it's better if the
author/maintainer does this. Do any other perl implementations use unusual
encodings for \r\n ?

This issue is a bit like a lot of network code developed on Sun/etc systems
initially misses an occasional hton() encoding simply because the host ==
network byte-order. It would be a nice addition to lint tools to keep track
of which byte order a variable holds.


Danny Thomas


* HTTP-1.1 rfc2068 is flexible wrt to line-breaks in textual content, but
not in the protocol messages:
  This flexibility regarding line breaks applies only to text media in
  the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within
  any of the HTTP control structures (such as header fields and multipart
  boundaries).