[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Efficient Search in Perl?




Chris,

You helped me out a while back with some porting of a word lookup program
into MacPerl.  I am pleased to say that this program is now fully
functional, but I have a problem: the perl component is run each time a
user loads a new web page and it is taking about 90s to complete the task
(on a 200Mhz PowerMac)!!

The program is searching two very large text files.  The first file is a
word lookup which has a list of all words and their associated key value.
The words are not arranged in any kind of alphabetical order.  Here is an
example:
	100001740,'entity'
	100001740,'something'
	100002086,'life_form'
	100002086,'organism'
	100002086,'being'
	100002086,'living_thing'
	100002880,'life'
	100003011,'biont'
	100003095,'cell'
	100003731,'causal_agent'
	100003731,'cause'
	100003731,'causal_agency'
	100004123,'person'
	100004123,'individual'
	100004123,'someone'
	100004123,'somebody'
	100004123,'mortal'
	100004123,'human'
	100004123,'soul'
	100008019,'animal'
	100008019,'animate_being'

The second file takes the same format, but contains a list of all word
descriptions and their associated key values.  Here is an example:
	100001740,'(anything having existence (living or nonliving))'
	100002086,'(any living entity)'
	100002880,'(living things collectively; "the oceans are teeming
with life")'
	100003011,'(a discrete unit of living matter)'
	100003095,'(the basic structural and functional unit of all
organisms; cells may exist as independent 			units of
life (as in monads) or may form colonies or tissues as in higher plants and
animals)'
	100003731,'(any entity that causes events to happen)'
	100004123,'(a human being; "there was too much for one person to do")'
	100008019,'(a living organism characterized by voluntary movement)'
	100008864,'(a living organism lacking the power of locomotion)'
	100009457,'(a physical (tangible and visible) entity; "it was full
of rackets, balls and other objects")'
	100010123,'(an object occurring naturally; not made by man)'
	100010572,'(that which has mass and occupies space; "an atom is the
smallest indivisible unit of matter")'
	100011575,'(any substance that can be metabolized by an organism to
give energy and build tissue)'
	100011937,'(a man-made object)'
	100012704,'(one of a class of artifacts; "an article of clothing")'
	100012865,'(a feature of the mental life of a living organism)'
	100013018,'(a general concept formed by extracting common features
from specific examples)'

The first file is over 3meg and the second is over 9meg.  At the moment all
I am doing is reading the file line-by-line in a while loop and using the
Perl 'eq' to do a string match.  Can you think of a smarter way of doing
this??
I have thought of firstly sorting both files so that at least the two key
fields take the same order, but I'm not sure if even this will help any.
Algorithms and Data Structures never was my strong point!

Any help is, as always, greatly appreciated.


Shyam.

__________________________________________________
Developers                                   @ C=+

dev@cequel.co.uk         #include 'cheesyquote.h';
__________________________________________________



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch