[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Cross-platform conversion tables for accented chars



At 12:29 27/04/96 -0400, "jose.stephane@uqam.ca" <Stephane.Jose@uqam.ca> wrote:
>Bart Lateur wrote:
>
>> > A bigger difference is those accented characters you're talking about.
>> > These are *not* part of the standard ASCII set, and have codes between 128
>> > and 255. I know of 4 platforms: Mac, PC DOS (OEM), PC Windows (ANSI), Unix
>> > (probably ANSI as well). Each has it's own "standard".
>> ...
>> > where the .... 's are replaced by a list of 128 characters, the translation
>> > table. If anyone's interested, I can post my tables for DOS>MAC and
>> > ANSI>MAC.
>
>I'd definitely appreciate seeing those tables.

Here it comes!


The reason I hadn't included the tables in my original post, is because I
needed a decent way to transfer the tables, independent of your computer
system and mail program.

I found an elegant way, provided you have a working copy of Perl. ( ;-)
I use Perl's built-in uudecoding to restore the original files.

So I've included a Perl script *in* this message, that will decode the tables.

Copy/paste this script into your MacPerl window, and run it. You'll get 8
files: 4 .dat files of 128 bytes each (check it!), plus 4 .pl files,
one-line Perl scripts, that can be pasted in your own scripts at appropriate
places, of the form:

        tr/\200-\377/ ... /;

Note that these translations should work unchanged (!) on any platform.

This does not handle the newline problem, but that's not that difficult,
though platform dependent.


A good idea might be to save these as BBedit extension files (check
Matthias' FTP site). I haven't tried this out yet, but it might be a great
way to add cross-platform conversion to BBedit.

Note that you might need to search *where* your files are saved. Check
MacPerl's home directory first.


I don't think my tables are perfect. That isn't even possible, because not
all characters are represented in every set. E.g. the Mac doen't have the
capital E with an umlaut, and the Ansi set doesn't know about the O/E
ligature (!) (How do the French deal with this?)

But I do use these conversions a lot professionally, to translate Ascii
files from Windows to Macintosch. I haven't encountered any real problems so
far.


Good luck!!!

Bart.


#! perl
&savePl('dos2mac',<< );
\>\@I\^\.B8J\(C\-B0D8\^5E\)\.\`\@8\.\^KIF\:F\)Z\=V\(6\?KZ\.O
\>UX\,\@DI\>\<EH2\[O\,\"HPKT\@P\<\?\(7U\]\?\?\'R\'\(\,NI\?\'PK
\>\*Z\*T\*RLM\+2LM\*XN\+\*RLM\+7PM\*R\/PT\)\"1CVF2E\)4K
\>\*U\]\?\?\)\-\?EZ\>9F\)N\;M\:\>GG\)Z\=\>5G1J\]\"Q7\[ZFI\)\>X
\(H\:RWN\;\.R7R\`\`

&savePl('dos2win',<< );
\>Q_SIXN3\@Y\>\?JZ\^CO\[NS\$Q\<GFQO3V\\OOY_\]\;\<\^\*\/8
\>UX\,\@\[\?\/Z\\\=\&JNK\^N\(\+T\@H\:N\[7U\]\?IJ\;\!\(\,\"IIJ8K
\>\*Z\*E\*RLM\+2LM\*\^\/\#\*RLM\+\:8M\*Z3PT\,K\+R\&G\-SL\\K
\>\*U\]\?ILQ\?T\]_4TO75M\?\[\>VMO9_\=VOM\*VQ7\[X\@I_\>X
\(L\*BWN\;\.R7R\`\`

&savePl('mac2win',<< );
\>Q\,7\'R\=\'6W\.\'\@XN3CY\>\?IZ\.KK\[\>SN\[_\'S\\O3V\]\?KY
\>\^_R\@L\*\*CIY6VWZZIF\;2L\@\,\;8\@\;\&\*C\:6UMH\^0FIVJ
\>NI\[F\^\+\^AK\*\:\#K\:\^KNX6\@P\,\/5C\)R6EY\.4D9\+WLO\^\?
\>LZ2\+F\[F\\A\[\>\"A\(G\"RL\'\+R\,W\.S\\S3U\+W2VMO9OHB8
\(T\-\?\=WKCP_\?X\`

&savePl('win2mac',<< );
\>\@\(\&\"\@X2\%AH\>\(B8J\+C\(V\.CY\#4U\=\+3I\=\#1F\)F\:FYR\=
\>GI_\*P\:\*C\(\[1\\I\*RIN\\\?\"T\*C1H\;\&RLZNUIK\>XN\;S\(
\>O\+V\^P\,N\'B8N\`\@\:Z\"CX\.0D9\.2E\)70A\)B7F9N\%UZ\^\=
\>G\)Z\&6\:\>GB\(\>\)BXJ\,OMB\/CI\"1DY\*4E\?\"6F\)\>9FYJ7
\(KYV\<GI\]YI\]\@\`

sub savePl {
   local($file,$uuencoded)=@_;
   local($,,$\,$_);
   $_=unpack("u",$uuencoded);
   open(OUT,">$file.dat"); select(OUT);
   binmode(OUT);
   print OUT;
   open(OUT,">$file.pl");
   binmode(OUT);
   print OUT "tr/\\200-\\377/$_/;";
   close(OUT);
   print STDERR "$file ",length($_)," bytes";
   # should be 128 every time!
}