[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-AnyPerl] Lookups and efficiency



At 2:34 PM 9/30/00, Todd Richmond wrote:
>Hi all,
>
>I'm trying to optimize a script that does lookups. Here's the 
>problem: each day the script processes between 1000 to 100,000+ new 
>records. I have to compare the record's unique identifier (a 5-8 
>digit integer) to a master list of over a million IDs. [snip] The 
>question is, what can I do to reduce the amount of memory required, 
>but still maintain the speed?

How about reading the master list into your hash in chunks?

##### pseudo-code:  ####

$count = 0;
$marker = 0;
LOOP: for (;;) {
  open ID, "$masterIDfile" or poop_out;
  seek ID, $marker+1, 0;
  my %hash;
  HASH: while (<ID>) {
    $hash{$_} = 1;
    (++$count > 100000 or eof)
      and $marker = tell ID
        and last HASH;
  }
  close ID;

  last LOOP if scalar keys %hash == 0;

  TEST: for (0..$#new_ids) {
    $new_id = shift @new_ids;
    exists $hash{$_}
      and push (@keepers, $new_id)
        or push (@new_ids, $new_id);
  }

}

open ID, ">>$masterIDfile" or poop_out;
print ID join("\n", @keepers), "\n";
close ID;

##### end pseudo-code  ####

Just an idea;

1;

- Bruce

__Bruce_Van_Allen___bva@cruzio.com__831_429_1688_V__
__PO_Box_839__Santa_Cruz_CA__95061__831_426_2474_W__


==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org