[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-AnyPerl] Regular Expression Problem



At 16:50 -0400 4/29/1999, Ronald J. Kimball wrote:
A more efficient regex would use a negated character class with greedy
matching, as in:

s/See also ([^.]+)\./ whatever /eg;


Ronald
Campaign to Stamp Out Needless Use of Non-Greedy Matching
:)

I did run into a greed-related problem along with a few others. What I wound up with follows (I don't know about efficiency, but while this won't even run on my Mac due to memory exhaustion, it blows thru a 5 meg file on Solaris in less than 2 seconds and seems to do exactly what I wanted):

#!/usr/bin/perl -w

######
# Declare variables:
$/ = undef ;
$indir = "/home/sbl-home/EPubs/d/" ;
$outdir = "/home/sbl-home/EPubs/d/" ;
$infile = "dictionary2.txt" ;
$outfile = "dictionary.txt" ;

######
# Begin program:
$input = $indir.$infile;
$output = $outdir.$outfile;

open(IN, "<$input") ||
die "Can't open $infile $!\n";


$text = <IN>;

close(IN);

open(OUT, ">$output") ||
die "Can't open $outfile $!\n";


select(OUT);

$text =~ s/<I>See\salso<\/I>\s([^\.]+)\.+?/&see_also_refs($1).'.'/ge;
$text =~ s/<I>See<\/I>\s([^\.]+)\.+?/&see_refs($1).'.'/ge;
print $text;

######
# Subroutines
sub see_also_refs {
my $oldstr = shift;


my @words = split(/;\s?/, $oldstr);
my $newstr = "<I>See also</I> ";
foreach $word (@words) {
$href1 = "term=";
$href2 = "&case=i";
$href = "$href1$word$href2";
$href =~ s/([^=&a-zA-Z0-9_\-.,])/uc sprintf("%%%02x",ord($1))/eg;
$newstr .= '<A HREF="/cgi-bin/SBL/searchdict.pl?'.$href.'">'.$word.'</A>; ';
}
chop $newstr;
$newstr;
}

sub see_refs {
my $oldstr = shift;


my @words = split(/;\s?/, $oldstr);
my $newstr = "<I>See</I> ";
foreach $word (@words) {
$href1 = "term=";
$href2 = "&case=i";
$href = "$href1$word$href2";
$href =~ s/([^=&a-zA-Z0-9_\-.,])/uc sprintf("%%%02x",ord($1))/eg;
$newstr .= '<A HREF="/cgi-bin/SBL/searchdict.pl?'.$href.'">'.$word.'</A>; ';
}
chop $newstr;
$newstr;
}

######
# End program





Richard Gordon
--------------------
Gordon Consulting & Design
Database Design/Scripting Languages
mailto://maccgi@bellsouth.net
770.565.8267