[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-Modules] XML::Parser: support for native Mac (Roman) character set



On Sun, 5 Dec 1999 10:38:08 -0400 (AST), Arved Sandstrom wrote:

>On Sat, 4 Dec 1999, Bart Lateur wrote:

>>  - Provided I get the encoding table right, would there be any interest
>> in distributing it? Would there be any reason to maybe include them in
>> the XML::Parser package, possibly only the Mac version?
>>
>Getting this done with XML::Encoding would be excellent.

It looks like I made it work. Well... I've messed a bit, but the end
result looks right. Here's a few things people should look out for in
order to make the module and the two included scripts work on MacPerl:

1) "test.pl" as provided fails on test #7. But that is only because the
Mac uses a different format for it's error reporting. Removing the
anchoring caret ("^") in the regular expresion makes the test succeed.

2) It looks like the script "make encmap" is designed to process the
same format of tables. I had written my own script (which only processes
single-byte character sets). The main problem with this is that it
depends on the command line too much. Making the "name" default to a
variation on the file name, should allow you to save and run it as a
droplet, I guess. If you don't like the final name used, you can edit
the name attribute in the XML file. 

I haven't done this yet. I will, soon.

3) The other script, "compile encoding" doesn't work as is with MacPerl,
because it needs Perl 5.005, and MacPerl (release) is still at 5.004.
But removing the use of the fields.pm module, and changing the pfxmap
object to an anonymous hash instead what it was originally (line 35)
makes it work, and you can save and run it as a droplet. This one works.

>>  - What would be the proper name for the table? I'm thinking of
>> "Mac-Roman" for the Mac.
>>
>Adobe PDF refers to MacRoman. Why not here? :-)

I like that name. I found it on a great website all about character
encodings, but I lost the URL. I'll try to find it back, it's a very
useful reference. Anyway, the standard looks to be lower case names
only, and I misread "macroman" as "macro-man", which looks wrong.
"mac-roman" is understandable.

>I think you should also look at the IANA (Internet Assigned Numbers
>Authority) website (whatever it is) to see what chaacter encodings are
>already registered with them as "charsets", so that you'll use the right
>name. It may already be there.

Huh, yeah. There is a textfile included with the module, called
"IANA-assigned-character-sets". The name used in this is "macintosh", or
"mac". I think that name is a bit Western-centric, almost colonial.
 
>>  - Most importantly: once in place, can I use these to decode XML files
>> into ordinary Mac text? What would be the module's syntax? Can I encode
>> XML files so that they're flagged as using Macintosh specific text, not
>> just ASCII?
>> 
>Well, yes to the latter. Once you've got an encoding in place then the XML
>declaration looks something like
>
><?xml version="1.0" encoding='EUC-KR'?>

Yup, encoding it as "mac-roman" allows me to do

   <?xml version="1.0" encoding='mac-roman'?>

in the file, or

   $parser->parse($xml, ProtocolEncoding => 'Mac-Roman');

in the parser. It looks like it works.

But that brings up another problem. It looks like the text it outputs is
always UTF-8 (this is sort of a compressed Unicode, Ascii + multibyte
for special characters)? I can't find a way to turn this back to
Macintosh text? Not with the plain XML::Parser, at least.

That looks odd. The character encodings are provided for this module,
but they look like being used ONLY in one way: converting to Unicode
(and from there to UTF-8).

I can't find a UTF-8 decoding module at CPAN.

Besides, having to build yet another decoder to convert UTF-8 back to
Macintosh text, looks like double work to me. At least, it should try to
reuse the encoding files from XML::Parser.

Heh. I'm not out of this yet.

-- 
	Bart.

==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-modules-request@macperl.org