[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] parsing text from a file




Matthew Langford wrote:

> Isn't this essentially just a MacPerl behavior?  For years, few
> distinguished between octal codes for \n and \r, on any platform including
> Macs, until a certain innocent design decision to remap these in MacPerl.

No this is not just MacPerl, Mac C compilers (including Metrowerks and
things from Apple such as MrC) take "\n" -> \015.  The iostream in C++
takes "cout << endl;" -> \015 on the Mac.  Java has a special method for
you to ask it what EOL it would use on the current plaform.

> And "CRLF" is mostly a DOS thing...the real question _used_ to be what did
> your platform use as EOL (end-of-line) marker:  lf (Unix), cR (Mac), or
> CRLF (DOS).  

right and within perl and C on all platforms EOL is spelled "\n"
it happens to map to different character(s) depending on where you are.

> I guess it all started with the C programming language and "\n" as the
> universal force-a-new-line.  linefeed/lf/\n became EOL in a bazillion
> programs, so the platforms started having adjust to the code, rather than
> vice versa.  Perl could have done better, but really it has even deeper
> ties to the Unix tradition, so I can't blame it.

Tell it to Microsoft and Apple (as well as the ISO and ANSI): when they
drew up language standards "\n" within C was left as "implementation defined".
So Apple went with the CR character and MS went with CRLF on files that had 
not been fopen()ed with a "b" in the second argument.  This is part of the 
"definition" of a "text" file on the respective platforms.  Unix uses the 
linefeed character for "text" files' EOL but people often confuse "\n" 
with LINEFEED.  "\n" was intended to represent logical newline and does so 
within C and Perl on Unix, Windows, Mainframes, VAXes, and the Mac.  Do not 
confuse "newline" with "linefeed".  "LF", "LINEFEED" is ASCII character 10.
Unfortunately, the manual pages that come with some Unices (seer `man ascii`) 
do refer to the 11th character as NL - such doc sets are in error (there are 
some that abbreviate it LF too).  Note that the Unicode standard calls \xA a 
"LINE FEED" character (Unicode Standard V 2.0 ISBN 0-201-48345-9, page 7-7.).


Peter Prymmer




===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org