[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Porting Un*x perl scripts to MacPerl for CGI



Kent Cowgill wrote:

> their _file type code_ is actually 'TEXT' (I
> suppose I wasn't clear on that point in my previous message).  I assume
> MacPerl uses the file's type code to test -T, not (as someone suggested)
> the first few bites of the file.

Not if MacPerl is standard.

"The -T and -B switches work as follows.  The first block or so of
the file is examined for odd characters such as strange control codes
or characters with the high bit set.  If too many odd characters (>30%)
are found, it's a -B file, otherwise it's a -T file.  Also, any file
containing null in the first block is considered a binary file."
 - Camel2, p. 86

Yes, this is one of the cruftiest, ugliest, most hideous things ever
built into a language.  ("Or so"?  "Odd"?  "Strange"?!  These are
descriptions of an algorithm!?)  But hey, what the hell, determining
"text" vs. "binary" is pretty much a heuristic process anyway,
you might as well use a test that is only a 99% solution.  Esp.
when the 100% solution would be at least an order of magnitude more
expensive for time, and on many files, 4 to 5 orders of magnitude.

The moral of the story is that "text" vs. "binary" file isn't a
well-defined distinction, and if your code assumes otherwise, rethink
your approach.  (Doing "while (<F>) { }" on a binary file will not
break anything, though it might take more memory than you expect;
but the same is true of a pathological text file.)

-- 
 Jamie McCarthy          http://www.absence.prismatix.com/jamie/
 jamie@nizkor.org     Director of Operations, The Nizkor Project
                                          http://www.nizkor.org/