In Regards to your letter <v03007803b0b21f1707bf@[204.255.183.108]>: : : It looks for duplicates using a hash system (not too accurate, I know, but : it works, and I figure it's much faster than using s// (am I wrong?)) and : outputs a database without duplicates. The file I'm parsing is 4.7mb, and : using 10mb of memory, Perl runs out. Is there a leak here, do hashes take : up a huge amount of space more than tab-delimited text, or what? Hashses do take up space, and also 'keys %hash' generates a list containing the keys (if I'm not mistaken), so doing that when keys are your data, gives you yet another copy of the data in memory... using each instead of foreach will help in that department. Also, as mentioned before appending the output to a variable gives you yet another copy of the data in memory... As far as finding duplicates [call me blind, but I don't see it], it appears that all your script does, as is, is list the LAST record from your input file for each specific $user,$date combination. It doesn't do anything about duplicate data. If the only duplicates you are concerned with are $user,$time, then you don't need to keep any of the rest of the data in memory at all. Simply use the first line that matches for each combination and ignore the rest.. like... #!perl for $i (0 .. $#ARGV) { open (IN, "$ARGV[$i]") || die "couldn't open file in"; # open file dropped on droplet while (<IN>) { chomp ( ($user,$date,$rest) = split/\t/ ); # $rest = id, date, online (secs), logons, bad discos, router[s] $data{ $user }{ $date } = $rest if ! $data{ $user }{ $date }++; } close IN; } while ( ($user,$hash) = each %data ) { while ( ($date,$value) = each %$hash ) { # might be $%hash, I forget print "$user\t$date\t$value\n"; } } In any case, the output isn't really sorted, since once you put it into a hash then you take the output, you get whatever order Perl wants to give you... you need to sort it, both by user or by date if you want it sorted. And... if that's not an issue, you might as well print it out at he very beginning.... and not bother storing it into a hash/array. while (<IN>) { chomp ( ($user,$date,$rest) = split/\t/ ); # $rest = id, date, online (secs), logons, bad discos, router[s] print "$user\t$date\t$rest if ! $data{ $user }{ $date }++; } : for $i (0 .. $#ARGV) { : open (IN, "$ARGV[$i]") || die "couldn't open file in"; # open file : dropped on droplet : while (<IN>) { : chomp ( @info = split/\t/ ); # @info = id, date, online : (secs), logons, bad discos, router[s] : $user = shift( @info ); : $date = shift( @info ); : $data{ $user }{ $date } = join /\t/,@info; # now %data : } : close IN; : } : : foreach $user ( keys %data ) { : foreach $date ( keys %{ $data{$user} } ) { : $output .= "$user\t$date\t$data{ $user }{ $date }\n"; : } : } : : open (OUT, ">nodup.tab"); : print OUT $output; : close OUT; -Carl Carl A Baltrunas <carl@reststop.com> and Cherie Marinelli <2bunnies@1unique.com> Catalyst Industries: The One-Stop Internet registration and distribution service URL: <http://www.reststop.com> INFO: info@1unique.com -owned by EWBR & EWBR-ette [our house bunnies] and Czazu [our dog] Visit them at their hotel at http://www.reststop.com/info/bunny/bunnycam.html ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch