[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

SV: [MacPerl-AnyPerl] optimizing a beginners script



dear thomas,

thanks very much for the suggestions!
still - the only thing i think my problem is this (IN CAPITALS):


   foreach $line (@search) {
   # ....
   # extract image-file and dir name (match for 'gif' or 'jpg' endings, 
   # or in an HTML file, match <IMG src="xxx"> tags),
   # if match:
   # check if image-file exists in 'pic', 

HOW DO I CHECK THE IMAGEFILE IF I DONT ITERATE OVER THE DIRECTORY THAT
CONTAINS THE IMAGES?


(Or did I get this wrong?)
NO, CORERECT!

If the text file is actually an HTML-file, I would use HTML::Parser 
for parsing it (<IMG> tags, I guess).

WELL, ITS ACTUALLY A DUMP FROM A SQL-DATABASE COBMINED WITH SOME
IMAGE-NAMES:)

/(=\"?\/?.*?(\b[a-z0-9_]*?\b))?\/?($dollar)\"?/ig)

Please always provide an example (the actual line and the desired 
result of the match) or comment your regular expression

EX: <IMG SRC="SOMEDIR/FILENAME.GIF" MORE STUFF>

If the file was matched, it would be more efficient to exit the inner 
foreach with the 'last;' statement

WHICH INNER FOREACH?
AGAIN I THINK MY PROBLEM IS TO ITERATE OVER BOTH THE TXT-FILE AND THE ARRAY
CONTAINS THE IMAGE-FILE-NAMES AT THE SAME TIME

again
thanks very much
allan

original message below:
_______________________


Hi Allan.

>hi,
>
>using the copy module i´ve written a (beginners)script, which will look in
a
>txt.file to detecte images (gifs or jpgs) which at the same time also occur
>in a directory named "pics". when found in the txt.file it will make the
>relevant directory-name and copy the relevant image into that directory
just
>created. the whole stuff goes into a direcory called "copied".

If this is what you want, why are you iterating over all files in the 
'pic' directory?
Let's look at the worst case:

3000 files in 'pic' * 3000 lines in your text file  =  9.000.000 
iterations (Oops, need a CRAY? :-)

It would be much more efficient to iterate over the lines in your text file:

   foreach $line (@search) {
   # ....
   # extract image-file and dir name (match for 'gif' or 'jpg' endings,
   # or in an HTML file, match <IMG src="xxx"> tags),
   # if match:
   # check if image-file exists in 'pic',
   # mkdir (if exists) ,
   # copy file from 'pic' (if exists) to new dir
   #....
   # max. 3.000 iterations (IMHO)
   }

(Or did I get this wrong?)


>it does seem to work but i would greatly appriciate if someone could come
>with a few pointers as for how to optimize the script - it is very slow
when
>tested on large amounts of images (2-3000)(the txt file is also large
(3.000
>lines)), so thats why i´ve put a timer in the script.
>
>thanks in advance
>allan
>
>#!/pack/collect/bin/perl
>
>$start = (times)[0];
>
>opendir(COPYDIR, "pic") or  die "cant open pic";
>@pics = readdir(COPYDIR);
>closedir(COPYDIR);
>
>open(FRONTHTML, "dump.txt")or die "cant open dump.txt";
>@search = <FRONTHTML>;

If the text file is actually an HTML-file, I would use HTML::Parser 
for parsing it (<IMG> tags, I guess).

>
>
>for ($i=0; $i<=$#pics; $i++)
>	{
>	$dollar = @pics[$i];
>	if ($dollar =~ /[a-z0-9_]\.(gif|jpg)/ig)
>		{
>		foreach $line (@search)
>			{
>			if ($line =~
>/(=\"?\/?.*?(\b[a-z0-9_]*?\b))?\/?($dollar)\"?/ig)

Please always provide an example (the actual line and the desired 
result of the match) or
comment your regular expression

>				{
>				$directory = $2;
>
>				opendir(COPYMASTER, ".") or die "unable";
>				@nomast = readdir(COPYMASTER);
>				closedir(COPYMASTER);
>				mkdir("copied/$directory", 0666);
>
>				use File::Copy;
>
>
>copy("pic/$dollar","copied/$directory/$dollar");
>				copy("Copy.pm",\*STDOUT);'
>
>				use POSIX;
>				use File::Copy cp;
>
>				$n=FileHandle->new("/dev/null","r");
>				cp($n,"x");'

If the file was matched, it would be more efficient to exit the inner 
foreach with the 'last;' statement

>				}
>			#else
>			#	{
>			#	unlink "pic/$dollar";
>			#	}
>			}
>		}
>	}
>
>$end = (times)[0];
>$secs = $end - $start;
>
>open(FOUR, ">time.txt") ;
>print FOUR $secs;
>close(FOUR);
>

HTH

Best regards

--Thomas

==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to
macperl-anyperl-request@macperl.org

==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org