Aaron Crane wrote: > > Andrew Pimlott <andrew@pimlott.ne.mediaone.net> writes: > > Actually, I think it would be very cool to have a series of x-to-re > > modules that convert various pattern languages into perl regular > > expressions. All of the various glob styles are obvious candidates. > > I was toying with something a couple of days ago that led me to this. It > takes glob patterns as accepted by GNU locate(1) (complete with the logic > about whether or not to anchor the match) and gives you a qr// regex. You > get undef for an invalid glob pattern. The code's probably a little too > long, dull, and awkward to qualify as genuinely Fun, but the task has some > annoying little niggles that aren't entirely trivial. > > sub locate_glob_to_regex { > my $s = shift; > > # Anything backslash-escaped should be left as is. A question-mark > # should become a dot. A star should become a dot-star. A > # bracket-expression should be left as-is, except that a bang used to > # negate should become a caret. Any other non-\w char should be > # backslash-escaped. > > # Once we've done all that, we qr// it, *without* \Q, and anchor it if > # necessary. > > my $rx = ""; > my $found_meta = 0; > > for (my $i = 0; $i < length($s); $i++) { > my $c = substr($s, $i, 1); > if ($c eq "*") { > $found_meta = 1; > $rx .= ".*"; > } > elsif ($c eq "?") { > $found_meta = 1; > $rx .= "."; > } > elsif ($c eq "\\") { > return undef if ++$i >= length($s); > $rx .= $c . substr($s, $i, 1); > } > elsif ($c eq "[") { > $found_meta = 1; > > # Swallow a bracket expression and add it to $rx. Points to > # watch for: if the *first* character is a bang, make it a > # caret. If the first non-bang, non-caret character is a > # close-bracket, it doesn't close the expression. Thereafter, > # everything goes in unchanged, up to and including the first > # close-bracket. > > $rx .= $c; > > return undef if ++$i >= length($s); > $c = substr($s, $i, 1); > > if ($c eq "!" or $c eq "^") { > $rx .= "^"; > return undef if ++$i >= length($s); > $c = substr($s, $i, 1); > } > > # Whatever the first non-bang, non-caret char was, we add it > # to the rx. > $rx .= $c; > > for (;;) { > return undef if ++$i >= length($s); > $c = substr($s, $i, 1); > $rx .= $c; > last if $c eq "]"; > } > } > elsif ($c =~ /\w/) { > $rx .= $c; > } > else { # non-\w, non-special > $rx .= "\\$c"; > } > } > > $rx = "^$rx\$" if $found_meta; > return qr/$rx/; > } > Change "too long, dull, and awkward" to "short, exciting, and ???????" and you get "Fun with Perl"... sub locate_glob_to_regex { local $_ = "\e" . shift; s/\e\*/.*\a\e/ or s/\e\?/.\a\e/ or /\e\\$/ and return(undef) or s/\e(\\.|\w+)/$1\e/ or /\e\[[\!\^]\][^\]]*$/ and return(undef) or s/\e\[([\^\!]?)(\].*?)\]/[@{[$1 && '^']}$2]\a\e/ or s/\e\[([\^\!]?)([^]]+)\]/[@{[$1 && '^']}$2]\a\e/ or s/\e([^[])/\\$1\e/ or return(undef) until /\e$/; $_ = "^$_\$" if tr/\e\a//d > 1; return qr/$_/; } This *was* Fun... -- Rick ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe