At 2:34 PM 9/30/00, Todd Richmond wrote: >Hi all, > >I'm trying to optimize a script that does lookups. Here's the >problem: each day the script processes between 1000 to 100,000+ new >records. I have to compare the record's unique identifier (a 5-8 >digit integer) to a master list of over a million IDs. [snip] The >question is, what can I do to reduce the amount of memory required, >but still maintain the speed? How about reading the master list into your hash in chunks? ##### pseudo-code: #### $count = 0; $marker = 0; LOOP: for (;;) { open ID, "$masterIDfile" or poop_out; seek ID, $marker+1, 0; my %hash; HASH: while (<ID>) { $hash{$_} = 1; (++$count > 100000 or eof) and $marker = tell ID and last HASH; } close ID; last LOOP if scalar keys %hash == 0; TEST: for (0..$#new_ids) { $new_id = shift @new_ids; exists $hash{$_} and push (@keepers, $new_id) or push (@new_ids, $new_id); } } open ID, ">>$masterIDfile" or poop_out; print ID join("\n", @keepers), "\n"; close ID; ##### end pseudo-code #### Just an idea; 1; - Bruce __Bruce_Van_Allen___bva@cruzio.com__831_429_1688_V__ __PO_Box_839__Santa_Cruz_CA__95061__831_426_2474_W__ ==== Want to unsubscribe from this list? ==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org