[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Re: Efficient Search in Perl?



This looks to me like a natural for a single-I/O lookup, where you 
build the hash buckets with a hashed key (divided by # of buckets,
and take the remainder).  Collisions are handled in RAM by a clever
device involving a one-byte field per bucket which records which
records have been forced out to secondary locations.  How you choose
where to put the overflows does matter, but not much as long as the
overflows from bucket n don't all go to bucket m.

As you need more space in a hash bucket, you throw out records with
signatures of 254, 253, 252, ....  A signature is just another hash
down to a number from 0-254.  The byte table records the low water
mark for its bucket.  After you find the bucket this key should be
in, you check the table to see if he's still there by comparing his
signature to the low-water mark.  If not, find his next secondary 
bucket and proceed.  No I/O yet.  But if so, go read that bucket.
If this guy is in the table, that's the bucket he's in!

One, single, I/O.  Guaranteed!

Leave twenty percent or more empty space and the loading will never
get too intensive in rearranging which buckets things are in.

This method does not handle rapidly changing databases very well, 
but it whips any method that can't cut the I/Os per transaction to
ONE.  With a thousand buckets, a binary search should take about
ten I/Os, so this is ten times faster.

I've threatened to write this in Perl, but I'll be delighted to try
out somebody else's coding of it if anybody wants to try it.  If it
takes over a hundred lines of code, I'll be surprised.  Does it not
address the problem rather directly?

Dick

***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch