[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-AnyPerl] cutting minutes

To: Allan Juul <aju@mondo.dk>
Subject: Re: [MacPerl-AnyPerl] cutting minutes
From: Ronald J Kimball <rjk@linguist.dartmouth.edu>
Date: Mon, 9 Oct 2000 09:13:48 -0400
Cc: macperl-anyperl@macperl.org
In-Reply-To: <395E2F3C7422D41193A600508BC802231F065B@hawk.mondo.dk>; from Allan Juul on Mon, Oct 09, 2000 at 10:15:37AM +0200
References: <395E2F3C7422D41193A600508BC802231F065B@hawk.mondo.dk>

On Mon, Oct 09, 2000 at 10:15:37AM +0200, Allan Juul wrote:
> hi
> 
> below i have a fully working script (on windows but you probably get the
> idea)
> 
> the script looks in 2.854 directories 44.257 in files (its a very large
> website) to find certain lines and then write these lines along with
> information regarding the line number/the file name and path to a txt-file
> 
> the only problem is the time it take to finish (about half an hour)
> so if anyone can point me in a direction where the script might be optimized
> i´d musch appriciate it
> 
> and as a sitenote - how can i correctly time the execution of this script.
> the included timing ($ialt)at the end are way off?

times() counts processor time rather than elapsed time.  If you want to
measure elapsed time, use time() instead.

> #! perl -w
> 
> chdir("i:");	#change to relevant drive
> $start = (times)[0];
> $startroot = "txtfiles";
> $web = "some.txt";
> $extensions = "asp";	#asp-files
> $txt_folder = "$startroot";	#start root
> 
> &check_folders($txt_folder);	#call sub
> 
> foreach $file (@filelist) {	#loop through all possible files

Do you really need to build up the complete list of files before you start
processing?  It might be faster to process the files as you find them,
because you won't have to allocate the memory for the array.

> 		if ($file =~ /(.+\\)(\w+\.($extensions))$/ig) {

$extensions is constant over the life of the script, so you should use the
/o regex modifier here.

> 			$splitme = $2	#get filename

$2 is the only group you're using from the regex, so the first and third
parenthesized groups are unnecessary.

> 			open(SPLIT, $file) or die "can not open $file";
> #open file for reading
> 			@innersplit = <SPLIT>;
> 
> 				
> 			for ($i=0; $i<=$#innersplit; $i++){
> 				$count = $i +1;
> 
> 				$searchline = $innersplit[$i];
> 				if ($searchline =~
> /select[^\w<>]+(?!.+\border\b\s+\bby\b)/ig)	{

The /g modifier is unnecessary here.

> 				push(@txtarray, "in the file $splitme at
> line numer $count\nthe full path is: $file\nthe line says
> $searchline\n");

You don't need to assign the array element to $searchline:

if ($innersplit[$i] =~ /select[^\w<>]+(?!.+\border\b\s+\bby\b)/i) {
   push(@txtarray, "in the file $splitme at line number " . $i+1 . "\n";
                   "the full path is: $file\n" .
                   "the line says $innersplit[$i]
}

> 				}
>      			}
> 		}
> }
> 
> open(SHOWER, ">$web") or die "cant write";	#collect all text in
> textfile
> print SHOWER @txtarray
> close(SHOWER);

Similarly, is there any reason you need to build up all of @txtarray,
instead of printing each line as you find it?

> sub check_folders {	#sub
> 
> 	my($dir) = @_;
> 
> 	local (*FOLDER);
>    	my(@subfiles, $file, $specfile);
> 	
> 	opendir(FOLDER, $dir) or die "cannot open $dir";	#open any
> directory
> 	@subfiles = readdir(FOLDER);
>       closedir(FOLDER);
> 	
> 		foreach $file (@subfiles) {
> #loop through all files in any direcory
> 				$specfile = $dir . "\\" . $file;
> 
> 			if (-d $specfile && $file !~ /^\.{1,2}$/) {
> 				&check_folders($specfile);
> #recursion
> 			}	
> 			if ((-f $specfile) && ($file =~
> /\w+\.$extensions$/ig)) {

You can call -f _ instead of -f $specfile to save a second system call for
each file.

/o should be used here too.

> 					push(@filelist, $specfile);	#get
> file in array
> 				
> 	   		}
> 	}
> }
> 
> $end = (times)[0];
> $ialt = $end - $start;
> print $ialt;
> 	

So, I would rewrite the script to read in each file and output each line as
they come up, instead of building up two large arrays.  That would also
save one regex match per filename.

Ronald

==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org

References:
- [MacPerl-AnyPerl] cutting minutes
  - From: Allan Juul <aju@mondo.dk>

Prev by Date: [MacPerl-AnyPerl] cutting minutes
Next by Date: [MacPerl-AnyPerl] Re: macperl-anyperl-digest V1 #100
Prev by thread: [MacPerl-AnyPerl] cutting minutes
Next by thread: [MacPerl-AnyPerl] negative lookahead
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net