[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Memory leak?




In Regards to your letter <v03007803b0b21f1707bf@[204.255.183.108]>:
:  
:  It looks for duplicates using a hash system (not too accurate, I know, but
:  it works, and I figure it's much faster than using s// (am I wrong?)) and
:  outputs a database without duplicates. The file I'm parsing is 4.7mb, and
:  using 10mb of memory, Perl runs out. Is there a leak here, do hashes take
:  up a huge amount of space more than tab-delimited text, or what?

Hashses do take up space, and also 'keys %hash' generates a list containing
the keys (if I'm not mistaken), so doing that when keys are your data, gives
you yet another copy of the data in memory... using each instead of foreach
will help in that department.  Also, as mentioned before appending the output
to a variable gives you yet another copy of the data in memory...

As far as finding duplicates [call me blind, but I don't see it], it appears
that all your script does, as is, is list the LAST record from your input
file for each specific $user,$date combination.  It doesn't do anything about
duplicate data.

If the only duplicates you are concerned with are $user,$time, then you don't
need to keep any of the rest of the data in memory at all.  Simply use the
first line that matches for each combination and ignore the rest.. like...

#!perl
for $i (0 .. $#ARGV) {
  open (IN, "$ARGV[$i]") || die "couldn't open file in"; # open file dropped on droplet
  while (<IN>) {
    chomp ( ($user,$date,$rest)  = split/\t/ );
    # $rest = id, date, online (secs), logons, bad discos, router[s]
    $data{ $user }{ $date } = $rest if ! $data{ $user }{ $date }++;
  }
  close IN;
}

while ( ($user,$hash) = each %data ) {
  while ( ($date,$value) = each %$hash ) {  # might be $%hash, I forget
    print "$user\t$date\t$value\n";
  }
}


In any case, the output isn't really sorted, since once you put it into a hash
then you take the output, you get whatever order Perl wants to give you... you
need to sort it, both by user or by date if you want it sorted.  And... if that's
not an issue, you might as well print it out at he very beginning.... and not
bother storing it into a hash/array.

  while (<IN>) {
    chomp ( ($user,$date,$rest)  = split/\t/ );
    # $rest = id, date, online (secs), logons, bad discos, router[s]
    print "$user\t$date\t$rest if ! $data{ $user }{ $date }++;
  }


:  for $i (0 .. $#ARGV) {
:  	open (IN, "$ARGV[$i]") || die "couldn't open file in"; # open file
:  dropped on droplet
:  	while (<IN>) {
:  		chomp ( @info = split/\t/ ); # @info = id, date, online
:  (secs), logons, bad discos, router[s]
:  		$user = shift( @info );
:  		$date = shift( @info );
:  		$data{ $user }{ $date } = join /\t/,@info; # now %data
:  	}
:  	close IN;
:  }
:  
:  foreach $user ( keys %data ) {
:  	foreach $date ( keys %{ $data{$user} } ) {
:  			$output .= "$user\t$date\t$data{ $user }{ $date }\n";
:  	}
:  }
:  
:  open (OUT, ">nodup.tab");
:  print OUT $output;
:  close OUT;


-Carl

Carl A Baltrunas <carl@reststop.com> and Cherie Marinelli <2bunnies@1unique.com>
Catalyst Industries: The One-Stop Internet registration and distribution service
URL: <http://www.reststop.com>    INFO: info@1unique.com
-owned by EWBR & EWBR-ette [our house bunnies] and Czazu [our dog]
 Visit them at their hotel at http://www.reststop.com/info/bunny/bunnycam.html



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch