Hi all, I'm trying to optimize a script that does lookups. Here's the problem: each day the script processes between 1000 to 100,000+ new records. I have to compare the record's unique identifier (a 5-8 digit integer) to a master list of over a million IDs. If I haven't seen the ID before and the record meets my criteria I process it and add the ID to the master list. Right now, the way I do this is to read the master list of IDs into a hash, and then check to see if the key exists as I'm working my way through the new records. This works fairly quickly: 10-15 seconds to load the hash and then ~10 minutes to process 100,000 records (depending on how many meet the criteria). The problem with this, of course, is that I have to allocate a huge amount of memory to MacPerl to load this into memory (>120 MB). I'd like to do this more efficiently, especially since I can foresee a time in the not-so-distant future when the master list will no longer fit in memory. The question is, what can I do to reduce the amount of memory required, but still maintain the speed? I tried using Tie::SubstrHash (by forcing all the IDs to be an 8 digit integer). It definitely used less memory (~1/3 as before), but took almost 5 minutes to load the hash (and then gave me an error when I tried to check for the existence of a key...). Am I going to have to go to a database solution? If so, anyone have any suggestions? I would imagine that querying a database for every ID is going to be significantly slower than checking for the existence of a hash key. True? Thanks, Todd -- ***************************************************************** Dr. Todd Richmond Carnegie Institution of Washington 260 Panama Street Stanford, CA 94305 Email: todd@andrew2.stanford.edu Homepage: http://cellwall.stanford.edu/todd ==== Want to unsubscribe from this list? ==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org