[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] pure Perl:-Bulk matching with REGEX



I apologise now if I'm posting this to the wrong MacPerl list.

I'm working on a script that reads in a text file and a dictionary file and uses the dictionary file to process the text file in terms of word frequency and /or manipulation through regex matching-
armed with Learning Perl (the blue Llama), Programming Perl (the blue camel) and Advanced Perl Programming (the black panther standing on a blue box) and the PODs I _cannot_ find anything to help me achieve this end (that I can understand)

This snippet from perlfaq6:
while (<>) {
while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
$seen{$1}++;
}
}
while ( ($word, $count) = each %seen ) {
print "$count $word\n";
}

does indeed give me a word count (and I understand what the REGEX is looking for)- but not from the external dictionary even using something like:

while (<>) {
$file='path:to:dictionary';
open (IN, $file);
while ( /$file/g ) { # misses "`sheep'"
$seen{$1}++;
}
}


while ( ($word, $count) = each %seen ) {
print "$count $word\n";
}

Plus the fact the same
perlfaq6 states this kind of processing is highly inefficient and suggests using something like the following:

sub _bm_build {
my $condition = shift;
my @regexp = @_; # this MUST not be local(); need my()
my $expr = join $condition => map { "m/\$regexp[$_]/o" } (0..$#regexp);
my $match_func = eval "sub { $expr }";
die if $@; # propagate $@; this shouldn't happen!
return $match_func;
}

sub bm_and { _bm_build('&&', @_) }
sub bm_or { _bm_build('||', @_) }

$f1 = bm_and qw{
xterm
(?i)window
};

$f2 = bm_or qw{
\b[Ff]ree\b
\bBSD\B
(?i)sys(tem)?\s*[V5]\b
};

# feed me /etc/termcap, prolly
while ( <> ) {
print "1: $_" if &$f1;
print "2: $_" if &$f2;
}


Of which I can only say I seem to have found the chasm across which my logic will not leap - any pointers to web pages, documentation , or just plain explanations would be gratefully received