[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Recognizing Unix Line Breaks



Lasse Hiller¿e Petersen <lassehp@imv.aau.dk>
}Paul J. Schinder wrote:
}
}>To most of "everyone else", \n means \015\012 (Microsoft is a monument to
}>human stupidity in more ways than one).
}
}I agree with the rest of that paragraph, which is why I deleted it.
}However, I'm sure CRLF precedes MicroSnot DOS by a decade or so. It

Sure, but they didn't have to keep it.  Apple didn't think it necessary for
the Macintosh.  The Unix model of one character indicating the end of line
was available.  Gates may have had a reason, but it wasn't written in stone.

}certainly was used in CP/M, and judging by it's existence in various RFC,
}dating back before anyone had dreamed of putting DOS machines on the
}Internet, I would think other old OSes (VMS?) use it as well. In a wicked
}way CRLF even makes sense on slow machines with slow output devices and
}moderate storage. Just as \n makes sense on relatively fast machines with
}slow output devices and moderate storage (Here, processing of \n into CRLF
}is not a problem.) Today, it shouldn't really matter which one to use, so
}we _should_ probably apply the Postel principle of liberal handling of
}input and conservative generation of output. I love that principle.
}Alas, there is no easy way to have Perl recognize all three of CRLF, CR and
}LF as input line-breaks, is there?

Only cumbersome solutions, but it can be done.  For example, reading DOS or
Unix files line by line can be done by this:

read(FILE,$buf,2048);
@lines = split(/\015?\012/,$buf));  #if only \015?\012? would work
shift @lines unless $lines[0]; # in case you split the \015\012 of the
previous block
push(@file,@lines);  #store them with the others

but don't try this on a Mac file.  But Mac files you can already handle, so
all you need is to detect what kind of file it is.  You can do this by
opening the
file, reading a block, and then counting the \015's and \012's with tr.

It can be done, but it isn't that pleasant.  There may be a cleverer regexp
that can be used to do all three types at once, but I've never been able to
find one.

}
}BTW, I have noted that some HTTP servers do not send CRLF as they should,
}and one one occasion a server didn't like CR at all in GET commands. Has
}anyone else experienced this? This actually broke LWP, and I had no way to
}force LWP to use a different line-break convention to deal with this
}server, very apropos. I solved it by using sockets directly and writing my
}own GET commands. (Also, judging from my observations, it seems Netscape
}sends GET <URL> LF rather than GET <URL> CRLF. I know this is not very
}MacPerl relevant, but I'd appreciate confirmation of this in private mail.)

Yes, many don't do things right.  LWP will take care of the ones that send
\015\012, but there are many that send out Unix text.  I always deal with
this by checking every file I download and converting if necessary.

}
}>$/ doesn't do what you'd like \s to do.  $/ gets appended to *every* line
}>you print, even when you don't really want it.
}
}No it doesn't? Or what do you mean by "gets appended to *every* line"?

Illegal instruction...brain dumped.  I shouldn't try to do things as
complicated as discriminating between $/ and $\ without benefit of
caffeine.  Sorry.

}
}-Lasse


---
Paul J. Schinder
NASA Goddard Space Flight Center
Code 693, Greenbelt, MD 20771
schinder@pjstoaster.pg.md.us



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch