On Tue, 23 May 00 23:01:37 -0000, Joel Rees wrote: >>I had implemented them as closures, but using those in your own >>code turned out to be not so trivial. > >Probably a stupid question, but what's a closure? (just point me to pod >or whatever.) It's not really a stupid question, except that's in the FAQ. :-) It's one of the more advanced things in Perl. I learned of it from the book "Advanced Perl Programming", page 56 and onward. But it's in the online docs too, see perlfaq7 ("Perl Language Issues") , "What's a closure" and perlref, point 4. The example in perlref in particular is very cute. It is, in short, a way to define a sub on the fly, while baking in some variables that can be set at creation time. Each created sub will have it's own copy of those (static) variables. In fact, the ONLY difference between different subs is the static data. So how does it apply to my code? Well, as you may recall, the only difference between encoding/decoding for different character sets, is in the encoding/decoding hashes. Same code, different (static) data. So here is more or less, how my stuff could be used: use UTF8::Simple; *UTF8ToMac = UTF8::Simple::decoder('Macintosh'); *WinToUTF8 = UTF8::Simple::encoder('Windows-1252'); $mactext = UTF8ToMac(WinToUTF8($wintext)); # Win -> Mac or my $UTF8ToMac = UTF8::Simple::decoder('Macintosh'); my $WinToUTF8 = UTF8::Simple::encoder('Windows-1252'); $mactext = $UTF8ToMac->($WinToUTF8->($wintext)); One of the major problems with the first approach is with "use strict": you need to predeclare use vars qw(*UTF8ToMac *WinToUTF8) which isn't too obvious to most people. Even I tend to forget about it sometimes. A few remarks: * The name UTF8::Simple refers to two limitations on the character sets to be encoded/decoded: A) They must be single-byte character sets, or the encoding hash would get too big; B) The lower half (1 .. 127) *must* be Ascii-compatible. For speed, my encoding substitution pattern only tries to replace characters with character code between 128 and 255, which is usually a minority in the string. /$pattern/o doesn't work "properly" inside closures. It compiles the regex the first time any one of the generated functions is called. For the rest of it's lifetime, all functions will use that one pattern. Therefore, I use a predefined pattern [\000\200-\377] instead. * The strings 'Macintosh' and 'Windows-1252' refer to encoding files, which need to be loaded into the encoding and decoding hashes the first time an encoding is used. These could be the text files, as downloaded from <ftp://ftp.unicode.org/Public/MAPPINGS/>, used directly and simply parsed every time. Or, I could reuse the ".enc" files as used by XML::Parser. The downside is that you would need to have XML::Parser (and these files) installed. I'm not convinced about the speed-up. One of the bigger problems for implementing this, is finding out where those encoding files are! It is difficult for people to provide alternatives to my code, when they don't have anything to start from. So, I will post the module code in a few days, but first, I have some more cleaning up to do. Don't worry, the code won't be very big. Maybe even smaller than the length of this post... :-) -- Bart. # ===== Want to unsubscribe from this list? # ===== Send mail with body "unsubscribe" to macperl-request@macperl.org