Thanks for the info on closures, Bart. I do want to make one thing clear to list readers who aren't familiar with the problems of encoding UNICODE for UNIX. UTF-8 is an encoding scheme that allows an expanding (multi-byte) encoding for character sets and is (somewhat) compatible with a number of parsing techniques used in standard UNIX tools. It is being applied to UNICODE in some of the UNIX implementations. It is called UTF-8 for the fact that the basic parse unit is the octet (we usually call octets bytes, but "byte" has certain ambiguities.) UTF-8 uses the high bits of the lead octet (byte) to show the count of octets remaining to be read before you have a complete character. It also uses the high bits of the remaining octets to distinguish between octets in the middle and octets at either end. It is designed so it can not extend beyond six or seven bytes, so you don't have to worry about getting caught in an endless loop just parsing past one character, and you can also find the character boundaries even if you start scanning in the middle, whether going forward or backward. Why not make everything a constant 2-byte width and be done with it? Early estimates for CJKV were about 3,000 each for everyday use, with lots of overlap, so the 20,000 allocated in UNICODE was thought to have been enough. But the real numbers are coming in, and each is counting more than 50,000. And the overlap is not as useful as we want to believe. How about making everything a constant 32 bits wide? Other than political problems, it might work. Might not. A lot of the core algorithms in UNIX (and other systems) are dependent on the approximate width of the byte. Anyway, UTF-8 is an encoding scheme intended to let major UNIX applications run without choking on the large character sets, not a character set. UNICODE has been mapped onto the encoding scheme. Joel Rees ---------------------------------------- Keeping the Faith <joel_rees@sannet.ne.jp> <http://www.page.sannet.ne.jp/joel_rees> (free account:) <reiisi@nettaxi.com> <http://www.nettaxi.com/citizens/reiisi> ---------------------------------------- # ===== Want to unsubscribe from this list? # ===== Send mail with body "unsubscribe" to macperl-request@macperl.org