At 08:46 -0700 10/1/00, owner-macperl-anyperl@macperl.org wrote: On Sat, 30 Sep 2000 14:34:47 -0700, Todd Richmond <todd@andrew2.stanford.edu> wrote: >I'm trying to optimize a script that does lookups. Here's the >problem: each day the script processes between 1000 to 100,000+ new >records. I have to compare the record's unique identifier (a 5-8 >digit integer) to a master list of over a million IDs. If I haven't >seen the ID before and the record meets my criteria I process it and >add the ID to the master list. Right now, the way I do this is to >read the master list of IDs into a hash, and then check to see if the >key exists as I'm working my way through the new records. >... >The question is, what can I do to reduce the amount of >memory required, but still maintain the speed? # Step 1 - store the _new_ IDs (not the ones in the master database) # in a hash. open NEW, "newRecords"; while (<NEW>) { chomp $_; my ($id,@rest) = split/\t/, $_; $new{$id} = 1; } close NEW; # Step 2 - now cycle through the master database one record at a time. # If you match a master ID with a new ID, delete it from the hash # (because you've already got them on file). open MASTER, "masterDB"; while (<MASTER>) { chomp $_; my ($id,@rest) = split /\t/, $_; if ($new{$id}) { delete $new{$id}; } } close MASTER; # Step 3 - cycle through the new records a _second_ time, and # process only the IDs that are new. open NEW, "newRecords"; while (<NEW>) { chomp $_; my ($id,@rest) = split/\t/, $_; if ($new{$id}) { ; # do your processing here, adding to master DB if required } } close NEW; I think you'll be pleasantly surprised. Henry. ==== Want to unsubscribe from this list? ==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org