[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] keyword parser for search engine



Aaron Crane wrote:
> 
> Rick <rklement@pacbell.net> writes:
> > sub locate_glob_to_regex {
> >   local $_ = "\e" . shift;
> >   s/\e\*/.*\a\e/ or
> >     s/\e\?/.\a\e/ or
> >     /\e\\$/ and return(undef) or
> >     s/\e(\\.|\w+)/$1\e/ or
> >     /\e\[[\!\^]\][^\]]*$/ and return(undef) or
> >     s/\e\[([\^\!]?)(\].*?)\]/[@{[$1 && '^']}$2]\a\e/ or
> >     s/\e\[([\^\!]?)([^]]+)\]/[@{[$1 && '^']}$2]\a\e/ or
> >     s/\e([^[])/\\$1\e/ or return(undef)
> >     until /\e$/;
> >   $_ = "^$_\$" if tr/\e\a//d > 1;
> >   return qr/$_/;
> > }
> >
> > This *was* Fun...
> 
> I actually considered a wildly different implementation which played the
> same sorts of tricks with unusual characters, but decided against it,
> because it doesn't work.  (I believe your code won't handle globs that
> contain \e.  Sure, you won't get many of those, but I prefer not to
> sacrifice correctness for Fun.)
> 
> It's a shame, though -- your code is quite nice in a twisted sort of way.
> Perhaps switching to \0 instead of \e would do the trick?  File names can't
> contain \0, and in my original application I was getting the globs off the
> command line, which of course also can't contain \0.  I'm not sure this will
> work, though, because you grab \a as well.  Maybe you could use (say) \0\0
> and \0\1 instead of \e and \a?  Of course, the tr/// would have to change to
> "... if tr/\0\1//d > 2".
> 

You're right - but it's Perl and therefore easy to fix...

Given \000 as an available marker, let's twist a little more
and do this:

sub locate_glob_to_regex {
  local $_ = "\000\000" . shift;
  s/\000\000\*/\000.*\000\000/ or
    s/\000\000\?/\000.\000\000/ or 
    /\000\000\\$/ and return(undef) or
    s/\000\000(\\.|\w+)/$1\000\000/ or
    /\000\000\[[\!\^]\][^\]]*$/ and return(undef) or
    s/\000\000\[([\^\!]?)(\].*?)\]/\000[@{[$1 && '^']}$2]\000\000/ or
    s/\000\000\[([\^\!]?)([^]]+)\]/\000[@{[$1 && '^']}$2]\000\000/ or
    s/\000\000([^[])/\\$1\000\000/ or return(undef)
    until s/\000\000$//;
  $_ = "^$_\$" if tr/\000//d;
  return qr/$_/;
}

By putting the \000 before the replacement, a second marker
is not needed.

> --
> Aaron Crane   <aaron.crane@pobox.com>   <URL:http://pobox.com/~aaronc/>

-- 
Rick

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe