[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[FWP] keyword parser for search engine



I'm not sure if there is need for one in module form, but I've written a
keyword parser for search engines that work on a simple level of wildcard
expansion.  I'm expanding the program right now, to allow for different
levels of regular expressions.  Right now, it uses shell-like wildcards *
and ?, which are equivelent to Perl's .* and . respectively.

The transformation of a keyword phrase from one using these simple
wildcards to a Perl regex isn't as short and sweet as might be hoped.  One
must ensure other special regex chars are escaped, and this parser has
rules like:

	jeff*		=>	/\bjeff/
	*rey		=>	/rey\b/
	Perl		=>	/\bPerl\b/
	je*y		=>	/\bje.*y\b/

But they're not THAT easy.  '*' isn't really, .*, it's \w*.  And those \b
boundaries are actually:

	jeff*		=>	/(?:\b(?=\w)|(?=\W))jeff/
	*jeff		=>	/jeff(?:(?<=\w)\b|(?<=\W))/

This is because if the search string is something like

	what...

And the string being matched against is

	As he turned his head, he mumbled, "what..."

As you can see, /\bwhat\.\.\.\b/ would fail because between . and " there
is NOT a \b.

Anyway, the little program (as it is for now) is available for comment:

  http://www.crusoe.net/~jeffp/perl/docs/regexes/keyword-parser

Let me know what you think.  It's backslash-happy. :)

-- 

  MIDN 4/C PINYAN, USNR, NROTCURPI     http://www.pobox.com/~japhy/
  jeff pinyan: japhy@pobox.com     perl stuff: japhy+perl@pobox.com
  "The Art of Perl"               http://www.pobox.com/~japhy/book/      
  CPAN ID: PINYAN  http://www.perl.com/CPAN/authors/id/P/PI/PINYAN/
  PerlMonth - An Online Perl Magazine     http://www.perlmonth.com/


==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe