[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Non-US Characters



>>> Robin McFarlane <robinmcf@altern.org> - 7/24/98 5:58 AM >>>
...
> I apologise if I'm wandering (again) from the main point of the
discussions
> here 
[Discussed 'global' software...]
> As a small aside I tried to FTP files containing Japanese HTML onto
a
> server using Anarchie. Normally good sturdy, robust, doesn't hang
and gets
> the job done, Anarchie doesn't make a faithful copy (not being
savvy to non
> English characters) and translates all the textual content into
gibberish,
> so I have to do the job by hand. Lack of foresight? bad
programming?

(Although this seems to be wandering off-topic I think the general
issues
of handling different character encoding systems are relevant to Perl
and
MacPerl)

I expect the problem you are having will be because your server is
probably
a unix machine using the EUC method of encoding Japanese characters
while
the Mac (and Windows) use Shift-JIS.  Although ASCII has become
almost
completely dominant as a way of encoding 'US' text, in Japan there
are
several methods of encoding Japanese and none of them dominate.  I
suppose
maybe eventually Unicode will become widely supported in operating
systems
and it will be possible to write programs that transparently handle
all the
world's languages.  Until that happens I think it's a bit unfair to
accuse
someone writing an application like an FTP client of bad programming
because
they haven't included translators for lots of different character
encoding
standards.  After all, even the issue of the best way to handle
different
line break characters (CR on Mac, LF on Unix, CRLF on Dos/Win) causes
endless, heated and inconclusive debate on this list:-)

As far as Perl and global compatibility goes; I suppose proper
support will
be left until Unicode is common on Unix OS's and for MacPerl this
would
depend on Apple incorporating it into MacOS.  For example, If you
look in
the RegExp section of the Perl FAQ you will see that in the case of
multi-byte languages it basically says that they are just too
difficult to
handle nicely at the moment.  One advantage of open source software
like
Perl, though, is that, while we're waiting for the slow process of
international standardization (slow for both technical and political
reasons), it is possible to customize versions for particular
requirements. 
The benefit of supporting 'cool' characters probably doesn't justify
the
extra effort of maintaining a separate version:-( (Although, I seem
to
remember someone once saying they were going to compile their own
version to
handle line breaks the way they preferred.)  In the case of Japanese,
however, the benefit of being able to use powerful Perl text
processing on
Japanese text probably does justify the effort of maintaining the
separate
JPerl (and MacJPerl) versions.

Marcus Sen

P.S.
I found the easiest way to transfer Japanese text between Mac and
Unix
machines was to use a program called JCONV-DD (available I think at
info-mac) which you can set to do the conversion between Shift-JIS
and EUC
(together with the CR to LF line break conversion) just by dropping
your
text files onto it.  Then transfer by Anarchie or whatever FTP client
in
binary mode.

***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch