lassehp@imv.aau.dk, mac-perl@iis.ee.ethz.ch, rdm@cfcl.com Subj: Re: [MacPerl] The MacPerl Pages and book (update) Lasse H Petersen's computer wrote: !Mime-Version: 1.0 !Content-Type: text/plain; charset="iso-8859-1" !Content-Transfer-Encoding: quoted-printable Lasse H Petersen wrote: !At 17:27 -0800 29/10/1997, Peter Prymmer wrote: !>I was awaiting either the "MacPerl Oddities" or perhaps the !>"Idioms And Programming Paradigms" chapter to mention something that I hadn= !'t !>realized til this past August when I heard Ken Lunde's presentatin at the !>Perl !>Conference. In it he remarked that MacPerl's \w regexp meta-character will !>match some of the high bit characters in the Mac extended ascii code page. !>This is perhaps worth careful emphasis not only because of its difference !>from !>unix but also in light of the use of the ISO-Latin-1 charset in cgi scripts= !, !>where character set rendering issues become blurry. ! !>P.S. a test script that exhibits this behavior is simple to construct: put !>some !>regular chars and option-chars into a $scalar then examince the array !>returned !>by a split(/\w/,$scalar) and see where the funny chars lie. e.g. !> !>print(join(">",split(/\w/,"string with funny chars"))); !> !>(where I have avoided actual 8-bit chars to prevent accidental MIME-ificati= !on !>of this email.) ! !How about: !$string_with_funny_chars =3D join("",map {chr} (32 .. 255) ) ; !print $string_with_funny_chars,"\n"; !print(join("*",split(/\w/,$string_with_funny_chars))),"\n"; ! !(that shouldn't be prone to MIME-garbling, except the "=3D", perhaps.) Yep that is a great way to do it (and yes there was an "=3D" MIME problem). !=46or MacPerl 514b2 this gives stars only for [0-9A-Za-z_], which is not !different from IRIX. It doesn't match any character with ord() >=3D 128. According to Ken and to the tests that I did yesterday on an older MacPerl there were several characters that matched with ord() >= 128. In fact, borrowing your idea: $string = join("",map {chr} (128 .. 255) ); @chars = split(/\w/,$string); print "$] $#chars\n"; resulted in: 5.00201 59 on the Mac in question. On an Alpha VMS box that is handy I obtain: "5.00401 0", on a Digital UNIX box I obtained "5.00301 0", on an RS/6000 AIX box I obtained "5.00403 0", on OS/390 OE V1R3 (an ebcdic computer) I obtain: $RCSfile: perl.c,v $$Revision: 5.0 $$Date: 92/08/07 18:25:50 $ Patch level: 0 -1 !However, on IRIX, if I do (stolen from perldoc POSIX): !use locale; !use POSIX; !POSIX::setlocale( &POSIX::LC_ALL, "es_AR.ISO8859-1" ); ! !then all the ISO-8859-1 accented letters become stars. Obviously this !doesn't work with MacPerl, as the Mac doesn't use ISO-8859-1, and the !locale pragma is not supported with MacPerl, it seems. ! !So I don't quite get what you mentioned about Ken Lunde's presentation? !As I see it, the only difference is that on Unix you can use locale to !change the behaviour of \w, whereas with MacPerl you cannot. Not that this !wouldn't be desirable, mind you. \w will match several chars in the 128..255 range under MacPerl 5.002_01 and without the use of perl's locale sensitivity ASCII UNIX and VMS machines will not match any of those (and the only EBCDIC port of perl that I have that can compile that test script is kind of whacky anyway). I am sorry but I am unable to get access to a recent MacPerl version. Perhaps you could report the output of running the above script with the more recent MacPerl? Thanks. Peter Prymmer ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch