[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Re: Efficient Search in Perl?



Hi folks,

A question was asked about efficient searches of word lookup tables:

>The program is searching two very large text files.  The first file is a
>word lookup which has a list of all words and their associated key value.
>The words are not arranged in any kind of alphabetical order.
>	100001740,'entity'
>	100001740,'something'
>	100002086,'life_form'
>	100002086,'organism'
>
>The second file takes the same format, but contains a list of all word
>descriptions and their associated key values.  Here is an example:
>	100001740,'(anything having existence (living or nonliving))'
>	100002086,'(any living entity)'
>	100002880,'(living things collectively; "the oceans are teeming

What is unclear is what you want to accomplish with these lookups. From the
look of it, it looks like a huge dictionary translating words into
definitions.  In this case, I think the most direct solution would be to do
a one time translation of the two files into a single Un*x DBM file, where
the unique key values are the words 'entity', etc. Then your "search" would
become a quick lookup, which it probably is.

There is a little section on the DBM routines in the on-line MacPerl book,
to which I've been referring as I learn MacPerl myself! Thanks to all
involved... it's been a real help.

Is there some odd reason to keep the numerical keys?

Re: writing your own hash. The Un*x DBM database routines are a pretty
implementation of hashing that hides the internal workings very nicely. I'd
be inclined to use the built-in hashing and not reinvent the wheel, though
I agree this is the best solution.

Re: binary search. I love binary search as much as anyone, but the
numerical fields are not unique in the first file. Can the built-in binary
search work on ASCII-sorted text? That could work as well if you swapped
the numerical ID and the words in file #1, and alphabetized.

Eric Hsu



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch