[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-AnyPerl] Lookups and efficiency



At 08:46 -0700 10/1/00, owner-macperl-anyperl@macperl.org wrote:
On Sat, 30 Sep 2000 14:34:47 -0700, Todd Richmond
<todd@andrew2.stanford.edu> wrote:

>I'm trying to optimize a script that does lookups. Here's the
>problem: each day the script processes between 1000 to 100,000+ new
>records. I have to compare the record's unique identifier (a 5-8
>digit integer) to a master list of over a million IDs. If I haven't
>seen the ID before and the record meets my criteria I process it and
>add the ID to the master list. Right now, the way I do this is to
>read the master list of IDs into a hash, and then check to see if the
>key exists as I'm working my way through the new records.
>...
>The question is, what can I do to reduce the amount of
>memory required, but still maintain the speed?

# Step 1 - store the _new_ IDs (not the ones in the master database)
# in a hash.

open NEW, "newRecords";
while (<NEW>)
{
  chomp $_;
  my ($id,@rest) = split/\t/, $_;
  $new{$id} = 1;
}
close NEW;

# Step 2 - now cycle through the master database one record at a time.
# If you match a master ID with a new ID, delete it from the hash
# (because you've already got them on file).

open MASTER, "masterDB";
while (<MASTER>)
{
  chomp $_;
  my ($id,@rest) = split /\t/, $_;
  if ($new{$id})
  {
    delete $new{$id};
  }
}
close MASTER;

# Step 3 - cycle through the new records a _second_ time, and
# process only the IDs that are new.

open NEW, "newRecords";
while (<NEW>)
{
  chomp $_;
  my ($id,@rest) = split/\t/, $_;
  if ($new{$id})
  {
    ;  # do your processing here, adding to master DB if required
  }
}
close NEW;

I think you'll be pleasantly surprised.

Henry.


==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org