[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Yet another Mac2ISO character converter



I've been lurking on this list long enough, I thought it's time to contribute 
something. I'm working on a script to convert scientific text from Quark to 
HTML, since I've found the existing tools do not meet our needs (especially 
regarding 8-bit characters and the <SUB> and <SUP> tags). 

I love Perl since I can write scripts at home under Linux that work at the 
office on our Macs!

The first step is to convert Mac characters to ISO-8859-1. Since practically 
any Mac character can occur, those that don't have direct ISO equivalents are 
converted to something legible.

Does anyone else on the list use MacPerl to manipulate Quark Tags?

-----------------
# Mac to ISO-8859-1 legible text conversion
# Scott Hanson (scott@int-res.com, shanson@mail.hh.provi.de)

# This code snippet converts 8-bit Mac characters to ISO-8859-1 equivalents
# or a legible 7-bit equivalent. I'm assuming that the source is 
# scientific text where any Mac character might occur and should be 
translated, 
# that we won't need to translate back to the Mac character set, and that
# it's OK if the result text is longer than the source. 

# Generally, if a character does not have an ISO-8859-1 equivalent,
# -- bullets, daggers, etc become *
# -- Greek letters and math symbols are spelled out or approximated
# -- punctuation is approximated
# -- accents are ignored 

# This will eventually be part of a custom Quark-Tags-to-HTML script. There
# are no HTML tags in the output; the Latin-1 tags in the comments are
# just a handy way to identify the characters. Names for the other 
# characters are just made up.

@trans = (

"\304", "\305", "\307", "\311",  "\321",  "\326", "\334", "\341",  
# Mac \200 - \207
# &Auml; &Aring; &Ccedil; &Eacute; &Ntilde; &Ouml; &Uuml; &aacute;

"\340", "\342", "\344", "\343", "\345", "\347", "\351", "\350", 
# Mac \210 - \217
#  &agrave; &acirc; &auml; &atilde; &aring; &ccedil; &eacute; &egrave; 

"\352", "\344", "\355", "\354", "\356", "\357", "\361", "\363", 
# Mac \220 - \227
# &ecirc; &euml; &iacute; &igrave; &icirc; &iuml; &ntilde; &oacute;

"\362", "\364", "\366", "\365", "\372", "\371", "\373", "\374",
# Mac \230 - \237
# &ograve; &ocirc; &ouml; &otilde; &uacute; &ugrave; &ucirc; &uuml; 

"*", "\260", "\242", "\243", "\247", "*", "\266", "\337", 
# MAC \240 - \247
# dagger &deg; &cent; &pound; &sect; bullet &para; &szlig;

"\256", "\251", "(TM)", "\264", "\250", "<>", "\306", "\330", 
# Mac \250 - \257
# &reg; &copy; trademark &acute; &uml; not_eq  &AElig;  &Oslash;

"infinity", "\261", "<=", ">=", "\245", "\265", "delta", "Sigma",
# Mac \260 -267
# infinity &plusmn; less/eq great/eq &yen; &micro; delta Sigma 

"pi",  "Pi", "integral", "\252", "\272", "Omega", "\346", "\370",
# Mac \270 - \277
# pi Pi integral &ordf; &ordm; Omega &aelig; &oslash;

"\277", "\241", "\254", "sqrt", "f", "~=", "Delta", "\253",
# Mac \300 - \307
# &iquest; &iexcl; &not; radical florin apeq Delta &laqot;

"\273", "...", "\240", "\300", "\303", "\325", "OE", "oe", 
# MAC \310 - \317
# &raqot; ellipsis &nbsp; &Agrave; &Atilde; &Otilde; OE oe 

"-", "--", '"', '"', '"', '"', "\367", "*", 
# Mac \320 - \327
# endash emdash open_qt clos_qt open_qt clos_qt &divide; diamond

"\377", "Y", "/", "*", '"', '"', "fi", "fl", 
# Mac \330 - \337
# &yuml; Y.umlaut fraction circ.x open_qt clos qt fi.lig fl.lig

"*", "\267", "'", '"', "o/oo", "\305", "\312", "\301",  
# Mac \340 - 341
# dbldagger &middot; under_qt under_dqt per_mil &Acirc; &Ecirc; &Aacute;

"\313", "\310", "\315", "\316", "\317", "\314", "\323", "\324", 
# Mac \350 - \357
# &Euml; &Egrave; &Iacute; &Icirc; &Iuml; &Igrave; &Oacute; &Ocirc;  

"*", "\322", "\332", "\333", "\331", "i", "", "", 
# Mac \360 - \367
# apple &Ograve &Uacute; &Ucirc; &Ugrave; dotless_i circumflex tilde

"\257", "", "", "", "", "", "", "" );
# Mac \370 - \377
# &macr; breve dot ring_above cedilia hung_umlaut ogonek caron 

s/[\x80-\xff]/$trans[ord($&)-(128)]/ge ;

-- 
Scott Hanson     <shanson@mail.hh.provi.de>   Asendorf, Germany
work: Inter-Research Science Publisher  <scott@int-res.com>
http://www.int-res.com