[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] On the meaning of '\n'...



At 04:12 96-03-11, Jens Christoffersson wrote:

>$string =~ s/\n/<br>/g;
>
>That line of code substitutes all linebreaks with "<BR>" in $string
>(which consists of the text written in a <textarea> tag). It works
>fine on UNIX, but not with MacPerl. MacPerl.Specifics says that "\n" on a Mac
>does not have the same ASCII number as "\n" on UNIX, and that might be the
>problem here. Anyway, can anyone tell me what that piece of code should look
>like on a Mac?

You don't say what the source of the text with the newlines in it is.
If it's something entered by the user in a form, then the actual bit
pattern is browser/platform dependant.

If it's a text file you moved over from Unix, then it depends how you
moved it.  If you moved it as a binary file, then the text file has the
wrong character in it.  If you propery transfer the files (ftp text
mode, kermit, etc.) there your code should work fine again.

It sounds as if you're trying to preserve the formatting of a text
file, regardless of line endings.  Something like the following should
work in all normal cases on all platforms, even if it is a bit longer:

$CR = "\015";
$LF = "\012";

$string =~ s/$CR$LF/$CR/og; # deal with MS-DOS 2 character newline
$string =~ s/$CR|$LF/<BR>/og; # deal with other ASCII systems.

The following is *not* directed to Jens, but rather at the community at large:
<<Pendantic Mode On>>
There is some confusion about the meaning of '\n' in C, and hence in
Perl.  (Actually, by some C programmers -- the standards are quite
clear about the issue.) While there isn't a formal definition of '\n'
in Perl, the various references say "it has the usual meaning [as C]".
Given that Perl is written in C, and uses portions of the C library, I
believe '\n' means the same in both languages.

In C, '\n' DOES NOT map to a specific bit pattern.  '\n' is a
metacharacter that is interpreted by the various routines within the C
library as representing the end of a line in a file in the native file
system for the operating system.  The specific bit pattern (if any!)
that '\n' maps to depends on the file system in use.

It so happens that '\n' is stored as a specific 8 bit pattern in
strings in memory that also happens to be the 8 bit pattern used by the
convention in the file system on many systems (e.g. Unix), but this is
not something that a programmer seeking to write portable code should
ever rely upon.

(Old time Mac programmers will recall that early Mac compilers used
different internal 8 bit patterns.  However, the C library I/O routines
always mapped to ASCII CR in disk files.)
<<Pendantic Mode Off>>

Because of the above, I (try to) never use '\r', and also never use
'\n' when I mean ASCII CR.  It takes a bit of getting used to, but now
I rairly get bit by line termination issues in my C and Perl code.

--Hal
Hal Wine <hal@dtor.com>     voice: 510/482-0597