[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Japanese Pages getting corrupted

To: "Ward W. Vuillemot" <wwv@u.washington.edu>
Subject: Re: [MacPerl] Japanese Pages getting corrupted
From: riechert@pobox.com (Andreas Marcel Riechert)
Date: Mon, 4 Oct 1999 17:13:21 +0900 (JST)
Cc: macperl@macperl.org



At  6:55 PM 99.10.3 -0700, Ward W. Vuillemot wrote:
|>  Okay, I also have two subroutines that write HTML code before and
|>  after whatever is contained in $content_file.
|>   
|>  When I comment out these two subroutines (the calls to them) then the
|>  code's output is correct.  So I imagine the problem is that Netscape gets
|>  the file -- sees single-byte and reads it ALL as such.  However, when I 
|>  have only the $content_file read to the file then Netscape encounters 2-
|>  byte and does what it should do.  

How are we supposed to help,  showing us only the code which works?
E.g if you used JIS-encoding you maybe didn't resore or delete your escape
sequences while processing. (For your terminological glossary: Its called
"broken JIS" which is unfortunately a very common thing. Therefore routines
and applications like "fix JIS", "repair JIS", "Mail-Fixer" are as well a very
common thing :-) 

|>  SO, can one mix multple bytes in the system?  Or do I need to convert
|>  the file completely to S-JIS or EUC or JIS?  As I run my scripts pretty
|>  much without modification on Mac and UNIX I want a solution that works
|>  for both.  Ideas?

Japanese-encodings are always a mixture of 1-byte and 2-byte characters.
Therefore 7-bit ASCII doesn't harm. 7-bit ASCII and  JIS-Roman is basically
the same. The backslash in JIS-Roman is shown as the Yen sign but that doesn't
harm your HTML-code.
JIS-Roman doesn't mean, JIS encoded as in "JIS vs Shift JIS encoding".
(JIS-Roman is included in S-JIS)

Anyway, if you don't know the basics of Japanese character-encodings, you
will never write save and stable applications. Get Ken Lundes book or grab the
the information from the Net.  You should know when to use JIS or SJIS and
when to use it not; the coding ranges etc. pp.

For example you cannot work with a variabke like " $my_japanese_word "
in JIS encoding without understanding JIS-encoding.
To print this word you have to set escapes, to match it you don't want to
have the escape-sequences. So working with JIS allways means seting and
erasing escape-sequences. (One reason I don't like JIS ;-)

One Unix we usually work with EUC-JP encoding. But that DOES NOT mean you 
cannot use SJIS.  (If you know what you' re doing)

Anyway before you proceed in your project try to learn the basics of JIS,
EUC-JP and SJIS. Don't use Japanese characters in your sourcecode if you
don't know what you're doing. (Some SJIS characters have the byte which
represents the single-quote character in ASCII as there second byte. etc)

If you need help with some script, show us the part that doesn't work.
Netscape sometimes sucks as well. So it is allways a good idea to view
check your html output in a Japanese editor.

HTH,

Andreas Marcel 


# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org

Prev by Date: Re: [MacPerl] Midi Mouth (requires MIDI.pm)
Next by Date: Re: [MacPerl] droplet
Prev by thread: Re: [MacPerl] Japanese Pages getting corrupted
Next by thread: Re: [MacPerl] Reading/Writing JavaScript Array in Perl (redux)
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net