[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Re: Tie::SubstrHash fails for very large hashes



I was just wondering if someone could check my sample code below and 
see if you get similar results.

Also, roughly speaking, how much more efficient is a Tie::SubstrHash 
than just a plain generic hash?

>I checked the archives and didn't find anything about this issue.
>
>I have been working recently on a project involving merging monthly 
>update files into a large flat-file database. Both the updates and 
>the original db are stored as fixed-length records in text files, so 
>I read them into two large hashes and then reconcile the data.
>
>I discovered, though, that I wound up with less than 20% of the data 
>coming back out.
>
>The following snip of code demonstrates the error:
>
>#------------------------------
>#! /usr/local/bin/perl5
>
>require Tie::SubstrHash;
>
>print "Here we go!\n";
>$hashsize = 114_862;		# arbitrary values from my data set
>tie %test, "Tie::SubstrHash", 13, 86, $hashsize;
>
>for ($i = 1; $i <= $hashsize; $i++) {
>	$key1 = $i + 100_000;		# fix to uniform 6-digit numbers
>	$key2 = "abcdefg$key1";
>	$test{$key2} = ("abcdefgh" x 10) . "$key1";
>}
>
>print scalar(keys %test), "\n";
>
>print %test, "\n";
>print $test{"abcdefg207250"}, "\n";
>print (keys %test), "\n";
>print "\nDone.\n";
>#-----------------------------
>
>If you remove the "require" and "tie" statements, everything works 
>fine, provided you've given MacPerl a huge enough memory allocation. 
>(The hash alone is over 10 MB, not counting any "bookkeeping 
>overhead.") Of course, you'll get a spew of garbled output from the 
>"print %test" and "print (keys %test)" commands.
>
>However, if you leave the code as is, on my system (PowerMac G4, 
>single 500 MHz, MacOS 9.0.4, MacPerl 5.2.0r4), you get exactly ONE 
>key/value pair back for the whole hash. It is specifically the one 
>with key "abcdefg207251". I tried pulling out the previous element 
>by name, and that worked fine, but both (%test) and (keys %test) 
>come out with only a single line of data.
>
>In my actual program, there's another hash with keylength of 6 
>instead of 13, but still a value length of 86; that one ends up with 
>only 18% of its data appearing in (keys %hash).
>
>My program works fine without using the Tie::SubstrHash module, but 
>it would still be nice to have it available for other programs.

-- 
Linc Madison  *  San Francisco, CA  *  LincPerl@LincMad.com
NO SPAM: California Bus & Prof Code Section 17538.45 applies!

# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org