[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-AnyPerl] seperate large file into smaller files



At 2:24 PM 12/16/00, allan wrote:
>hello,
>... i want perl to seperate the file
>into smaller files and each of these files
>should be 1000 lines long.

[snip]

>#!perl -w
>
>$file = "hit.txt";
>open (FILEHANDLE, $file) or die $!;
>$lines= 0;
>divide_files();
>
>sub divide_files {
>  while (<FILEHANDLE>) {
>   if ($. >= $lines && $. < $lines+1000) {
>    open (NEWFILES, ">$lines.txt") or die
>$!;
>    print NEWFILES $_;
>    $lines += 1000;
>    divide_files();
>   }
>  }
>}
>

1. Speak out loud what this script is doing. You'll notice that right 
after you say "print the line to the new file", you say "add 1000 to 
the break number". This means that after the very first line is 
printed, the break number ($lines) jumps by 1000, so the second line 
that will get to print will be line 1000, after which the break will 
jump to 2000...

2. Calling divide_files() at the end of the while loop is completely 
unnecessary, because the while loop occupies the entire subroutine. 
Just let it loop.

3. Then there's the question of opening the correct new files 
(1000.txt, 2000.txt, etc). No reason to open the same file 1000 times 
in a row. -- it's repetitive stress even for a computer.

So, try something like this (which you could write as a subroutine if 
necessary):

#!perl -w
my $file = "hit.txt";
my $lines= 0;
open BIG, $file or die "Can't open $file: $!\n";
open NEW, ">$lines.txt" or die "Can't open $lines.txt: $!\n";
while (<BIG>) {
     print NEW $_;
     next if $. % 1000;
     close NEW;
     $lines = $.;
     open NEW, ">$lines.txt" or die "Can't open $lines.txt: $!\n";
}

__END__

Did the job on a 145,000 record, 23 MB file in under ten seconds 
(BBEdit-MacPerl-G4-9.0.4).

The line "close NEW;" is probably unnecessary; the variable $lines 
could be factored out.

(Now what do I do with 145 thousand-line files?? ;-)

HTH

1;

- Bruce

__Bruce_Van_Allen___Santa_Cruz_CA__

==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org