I've been lurking on this list long enough, I thought it's time to contribute something. I'm working on a script to convert scientific text from Quark to HTML, since I've found the existing tools do not meet our needs (especially regarding 8-bit characters and the <SUB> and <SUP> tags). I love Perl since I can write scripts at home under Linux that work at the office on our Macs! The first step is to convert Mac characters to ISO-8859-1. Since practically any Mac character can occur, those that don't have direct ISO equivalents are converted to something legible. Does anyone else on the list use MacPerl to manipulate Quark Tags? ----------------- # Mac to ISO-8859-1 legible text conversion # Scott Hanson (scott@int-res.com, shanson@mail.hh.provi.de) # This code snippet converts 8-bit Mac characters to ISO-8859-1 equivalents # or a legible 7-bit equivalent. I'm assuming that the source is # scientific text where any Mac character might occur and should be translated, # that we won't need to translate back to the Mac character set, and that # it's OK if the result text is longer than the source. # Generally, if a character does not have an ISO-8859-1 equivalent, # -- bullets, daggers, etc become * # -- Greek letters and math symbols are spelled out or approximated # -- punctuation is approximated # -- accents are ignored # This will eventually be part of a custom Quark-Tags-to-HTML script. There # are no HTML tags in the output; the Latin-1 tags in the comments are # just a handy way to identify the characters. Names for the other # characters are just made up. @trans = ( "\304", "\305", "\307", "\311", "\321", "\326", "\334", "\341", # Mac \200 - \207 # Ä Å Ç É Ñ Ö Ü á "\340", "\342", "\344", "\343", "\345", "\347", "\351", "\350", # Mac \210 - \217 # à â ä ã å ç é è "\352", "\344", "\355", "\354", "\356", "\357", "\361", "\363", # Mac \220 - \227 # ê ë í ì î ï ñ ó "\362", "\364", "\366", "\365", "\372", "\371", "\373", "\374", # Mac \230 - \237 # ò ô ö õ ú ù û ü "*", "\260", "\242", "\243", "\247", "*", "\266", "\337", # MAC \240 - \247 # dagger ° ¢ £ § bullet ¶ ß "\256", "\251", "(TM)", "\264", "\250", "<>", "\306", "\330", # Mac \250 - \257 # ® © trademark ´ ¨ not_eq Æ Ø "infinity", "\261", "<=", ">=", "\245", "\265", "delta", "Sigma", # Mac \260 -267 # infinity ± less/eq great/eq ¥ µ delta Sigma "pi", "Pi", "integral", "\252", "\272", "Omega", "\346", "\370", # Mac \270 - \277 # pi Pi integral ª º Omega æ ø "\277", "\241", "\254", "sqrt", "f", "~=", "Delta", "\253", # Mac \300 - \307 # ¿ ¡ ¬ radical florin apeq Delta &laqot; "\273", "...", "\240", "\300", "\303", "\325", "OE", "oe", # MAC \310 - \317 # &raqot; ellipsis À Ã Õ OE oe "-", "--", '"', '"', '"', '"', "\367", "*", # Mac \320 - \327 # endash emdash open_qt clos_qt open_qt clos_qt ÷ diamond "\377", "Y", "/", "*", '"', '"', "fi", "fl", # Mac \330 - \337 # ÿ Y.umlaut fraction circ.x open_qt clos qt fi.lig fl.lig "*", "\267", "'", '"', "o/oo", "\305", "\312", "\301", # Mac \340 - 341 # dbldagger · under_qt under_dqt per_mil Â Ê Á "\313", "\310", "\315", "\316", "\317", "\314", "\323", "\324", # Mac \350 - \357 # Ë È Í Î Ï Ì Ó Ô "*", "\322", "\332", "\333", "\331", "i", "", "", # Mac \360 - \367 # apple Ò Ú Û Ù dotless_i circumflex tilde "\257", "", "", "", "", "", "", "" ); # Mac \370 - \377 # ¯ breve dot ring_above cedilia hung_umlaut ogonek caron s/[\x80-\xff]/$trans[ord($&)-(128)]/ge ; -- Scott Hanson <shanson@mail.hh.provi.de> Asendorf, Germany work: Inter-Research Science Publisher <scott@int-res.com> http://www.int-res.com