[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-AnyPerl] Regular Expression Problem



Richard Gordon wrote:
> 
> I did run into a greed-related problem along with a few others.

Oh, and if your "See also" extends over multiple lines, the non-character
class solutions would fail without the /s modifier.  :)

> What 
> I wound up with follows (I don't know about efficiency, but while 
> this won't even run on my Mac due to memory exhaustion, it blows thru 
> a 5 meg file on Solaris in less than 2 seconds and seems to do 
> exactly what I wanted):
> 

> open(IN, "<$input") ||
>     die "Can't open $infile $!\n";
> 
> $text = <IN>;
> 
> close(IN);

It's not surprising that you run into memory problems; you're reading the
whole file into memory, and then performing substitutions on it...  How is
your text formatted?  Perhaps you could read in paragraphs at a time?

> open(OUT, ">$output") ||
>     die "Can't open $outfile $!\n";
> 
> select(OUT);
> 
> $text =~ s/<I>See\salso<\/I>\s([^\.]+)\.+?/&see_also_refs($1).'.'/ge;
> $text =~ s/<I>See<\/I>\s([^\.]+)\.+?/&see_refs($1).'.'/ge;
> print $text;

So the above becomes...

$/ = '';

# ...

open(IN, "<$input") ||
     die "Can't open $infile $!\n";

open(OUT, ">$output") ||
     die "Can't open $outfile $!\n";

select(OUT);

while (defined($text = <IN>)) {

   $text =~ s/<I>See\salso<\/I>\s([^\.]+)\.+?/&see_also_refs($1).'.'/ge;
   $text =~ s/<I>See<\/I>\s([^\.]+)\.+?/&see_refs($1).'.'/ge;
   print $text;

}


If you don't have paragraphs, you could read in 'sentences'; you already
know your target text ends with period anyway:

$/ = '.';


Of course, this is only important if you really want to do this on your
Mac, since you said you can already do it just fine on your Solaris box.


Ronald

==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org