[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Searching a VERY large text file



On Thu, 5 Feb 1998 18:33:00 -0800 (PST), Brian "L." Matthews wrote:

>Note that when you read line by line, the input is already buffered, so
>all you're really doing is increasing the size of the buffer, and doing
>the buffer management in Perl instead of C.

No, there are in fact two reasons why it might (and probably will) be
faster:

  1) You're combining all the code you normally do for each line, into
one single Perl command. So this command can now do it's work at top
speed, scanning a reasonably large buffer.

  2) If you're doing a line by line search, you're searching through the
file twice (!). That's right: the first time to look for the end-of-line
characters, the second time for your search pattern.

>If you're really worried about speed, you'll get *far* more improvement
>by dumping the silly linear search and either doing a binary search on
>the text file or converting the text file to a DBM file and searching
>using that.

It depends. If you have to search through the same file many times,
you're right. If if you need to do this on different files every time,
it's plain silly, as the preprocessing time is now far greater than the
actual search will ever be (for a binary search, the file must be
*sorted*).

But coming back to the original question: it might be a better idea to
search for source in C, for effective file searching tools. If you need
regular expressions, search for a GREP derivate (I know DDJ released
source code for one), if it's a plain literal search you want, nothing
can beat boyer-moore (where parts of the searched string are skipped).

You might also look up GROUSE. There was an article in DDJ of november
last, where a guy described a nifty state-machine (the kernel written in
Intel assembler, but it's just a little) to search through text very
fast. Source is available on his site, ftp::/ftp.grouse.com.au/pub/wc/ .
The demo code includes a grep and a boyer-moore like tool.

Yeah yeah, I know, it's off-topic. But we MacPerlers should keep an open
mind, and put as many tools into our belts as we can. ;-)

	Bart.

***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch