[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Re: Efficient Search in Perl?



>
>If it was me, I would sort the files by numerical order of the first field,
>and then use Search::Dict to do a binary search on the file.
>
>#!perl -wl
>use Search::Dict;
>open(FH, ':file') || die $!;
>$key = '        100002086';
>
>look(*FH, $key);
>$x = <FH>;
>print $x;
>
>This will print the first occurrence of the key, which is:
>
>        100002086,'life_form'
>
>Of course, take out the whitespace if it is not needed.  Search::Dict does
>very fast searching, even over large files, but only if the lines are in
>order (having the numbers in order is sufficient).

Chris,

I have taken both yours and Michael Stoodts advice and resorted and
rearranged my two text files.  Michael was right in assuming that I wanted
to do a lookup for ID number on the first file and then use this ID to
reference the second file.  I have reimplemented my program using the
Search::Dict 'look' function and it runs much, much faster.  However the
'look' function is innaccurate and doesn't seem able to make an exact match
in over 50% of the times I have tried it (even when the word exists).  Is
there any other function that performs the equivalent of an 'eq' for string
match and '==' for integer match using the DB structure in Perl??  I want
to be able to get 'supercomputer' on its own even if there is an entry
before it like 'parallel_computer'.

Here is an example of the newly sorted files...

Lookup:
aardvark,101593926
aardwolf,101627385
aba,102155313
aba,102155402
abaca,108656674
abaca,110725396
aback,400073303
aback,400073386
abactinal,301606838
abacus,102155519
abacus,102155652
abaft,400270799
abalone,101455925
abamp,109802897
abampere,109802897
abandon,103826829
abandon,105561743
abandon,200415168
abandon,200415625
abandon,201421290
abandon,201524047
abandon,201524319

Reference:
100001740,'(anything having existence (living or nonliving))'
100002086,'(any living entity)'
100002880,'(living things collectively; "the oceans are teeming with life")'
100003011,'(a discrete unit of living matter)'
100003095,'(the basic structural and functional unit of all organisms;
cells may exist as independent units of life (as in monads) or may form
colonies or tissues as in higher plants and animals)'
100003731,'(any entity that causes events to happen)'
100004123,'(a human being; "there was too much for one person to do")'
100008019,'(a living organism characterized by voluntary movement)'
100008864,'(a living organism lacking the power of locomotion)'
100009457,'(a physical (tangible and visible) entity; "it was full of
rackets, balls and other objects")'
100010123,'(an object occurring naturally; not made by man)'
100010572,'(that which has mass and occupies space; "an atom is the
smallest indivisible unit of matter")'

__________________________________________________
Developers                                   @ C=+

dev@cequel.co.uk         #include 'cheesyquote.h';
__________________________________________________



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch