[Date Prev][Date Next][Thread Prev][Thread Next]
[Search]
[Date Index]
[Thread Index]
[MacPerl] pure Perl:-Bulk matching with REGEX
I apologise now if I'm posting this to the wrong MacPerl list.
I'm working on a script that reads in a text file and a dictionary file and uses the dictionary file to process the text file in terms of word frequency and /or manipulation through regex matching-
armed with Learning Perl (the blue Llama), Programming Perl (the blue camel) and Advanced Perl Programming (the black panther standing on a blue box) and the PODs I _cannot_ find anything to help me achieve this end (that I can understand)
This snippet from perlfaq6:
while (<>) {
while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
$seen{$1}++;
}
}
while ( ($word, $count) = each %seen ) {
print "$count $word\n";
}
does indeed give me a word count (and I understand what the REGEX is looking for)- but not from the external dictionary even using something like:
while (<>) {
$file='path:to:dictionary';
open (IN, $file);
while ( /$file/g ) { # misses "`sheep'"
$seen{$1}++;
}
}
while ( ($word, $count) = each %seen ) {
print "$count $word\n";
}
Plus the fact the same perlfaq6 states this kind of processing is highly inefficient and suggests using something like the following:
sub _bm_build {
my $condition = shift;
my @regexp = @_; # this MUST not be local(); need my()
my $expr = join $condition => map { "m/\$regexp[$_]/o" } (0..$#regexp);
my $match_func = eval "sub { $expr }";
die if $@; # propagate $@; this shouldn't happen!
return $match_func;
}
sub bm_and { _bm_build('&&', @_) }
sub bm_or { _bm_build('||', @_) }
$f1 = bm_and qw{
xterm
(?i)window
};
$f2 = bm_or qw{
\b[Ff]ree\b
\bBSD\B
(?i)sys(tem)?\s*[V5]\b
};
# feed me /etc/termcap, prolly
while ( <> ) {
print "1: $_" if &$f1;
print "2: $_" if &$f2;
}
Of which I can only say I seem to have found the chasm across which my logic will not leap - any pointers to web pages, documentation , or just plain explanations would be gratefully received