[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Re: Efficient Search in Perl?

To: mac-perl@iis.ee.ethz.ch
Subject: Re: [MacPerl] Re: Efficient Search in Perl?
From: Eric Hsu <erichsu@uclink.berkeley.edu>
Date: Thu, 19 Mar 1998 16:07:42 -0800
In-Reply-To: <9803192030.AA16238@cfcl.com>

Hi folks,

A question was asked about efficient searches of word lookup tables:

>The program is searching two very large text files.  The first file is a
>word lookup which has a list of all words and their associated key value.
>The words are not arranged in any kind of alphabetical order.
>	100001740,'entity'
>	100001740,'something'
>	100002086,'life_form'
>	100002086,'organism'
>
>The second file takes the same format, but contains a list of all word
>descriptions and their associated key values.  Here is an example:
>	100001740,'(anything having existence (living or nonliving))'
>	100002086,'(any living entity)'
>	100002880,'(living things collectively; "the oceans are teeming

What is unclear is what you want to accomplish with these lookups. From the
look of it, it looks like a huge dictionary translating words into
definitions.  In this case, I think the most direct solution would be to do
a one time translation of the two files into a single Un*x DBM file, where
the unique key values are the words 'entity', etc. Then your "search" would
become a quick lookup, which it probably is.

There is a little section on the DBM routines in the on-line MacPerl book,
to which I've been referring as I learn MacPerl myself! Thanks to all
involved... it's been a real help.

Is there some odd reason to keep the numerical keys?

Re: writing your own hash. The Un*x DBM database routines are a pretty
implementation of hashing that hides the internal workings very nicely. I'd
be inclined to use the built-in hashing and not reinvent the wheel, though
I agree this is the best solution.

Re: binary search. I love binary search as much as anyone, but the
numerical fields are not unique in the first file. Can the built-in binary
search work on ASCII-sorted text? That could work as well if you swapped
the numerical ID and the words in file #1, and alphabetized.

Eric Hsu



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch

References:
- Re: [MacPerl] Re: Efficient Search in Perl?
  - From: dick@cfcl.com (Dick Karpinski)

Prev by Date: Re: [MacPerl] Looking for a solution
Next by Date: [MacPerl] mini-httpd daemon as MacPerl runtime? quirks vs UNIX/CPAN?
Prev by thread: Re: [MacPerl] Re: Efficient Search in Perl?
Next by thread: Re: [MacPerl] Re: Efficient Search in Perl?
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net