[Date Prev][Date Next][Thread Prev][Thread Next]
[Search]
[Date Index]
[Thread Index]
Re: [FWP] Comparing Regular Expressions
On Tue, Sep 07, 1999 at 05:59:21PM -0500, D@i-works.com wrote:
> That's perhaps intractable, but at least well defined. Now
> a more interesting question: if a string is known to match
> two REs, then is there a sense in which one of the matches
> is "better", i.e. more specialized? How do you go about
> defining it? (If a string matches q/a.*/ that's intuitively
> more information than q/.*/.)
>
> I'll just go with length of the RE, sense be damned. But
> better ideas are appreciated,
Well, you can always play with the various regex debugging commands
(if you happen to have a perl compiled with -DDEBUGGING, every real
perl hacker should) using the -Dr flag which gives you the
pre-compiled form of the regex, its 'size' and a verbose explaination
of how its 'seen' by perl. All sorts of fun information, and that's
just the compile time stuff.
compiling RE `[bc]d(ef*g)+h[ij]k$'
size 43 first at 1
1: ANYOF(11)
11: EXACT <d>(13)
13: CURLYX {1,32767}(27)
15: OPEN1(17)
17: EXACT <e>(19)
19: STAR(22)
20: EXACT <f>(0)
22: EXACT <g>(24)
24: CLOSE1(26)
26: WHILEM(0)
27: NOTHING(28)
28: EXACT <h>(30)
30: ANYOF(40)
40: EXACT <k>(42)
42: EOL(43)
43: END(0)
anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
stclass `ANYOF' minlen 7
When the regex is actually run it will show you all the steps the
regex is performing, the backtracking, etc...
Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
Setting an EVAL scope, savestack=3
2 <ab> <cdefg__gh_> | 1: ANYOF
3 <abc> <defg__gh_> | 11: EXACT <d>
4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
4 <abcd> <efg__gh_> | 26: WHILEM
0 out of 1..32767 cc=effff31c
4 <abcd> <efg__gh_> | 15: OPEN1
4 <abcd> <efg__gh_> | 17: EXACT <e>
5 <abcde> <fg__gh_> | 19: STAR
EXACT <f> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
6 <bcdef> <g__gh__> | 22: EXACT <g>
7 <bcdefg> <__gh__> | 24: CLOSE1
7 <bcdefg> <__gh__> | 26: WHILEM
1 out of 1..32767 cc=effff31c
Setting an EVAL scope, savestack=12
7 <bcdefg> <__gh__> | 15: OPEN1
7 <bcdefg> <__gh__> | 17: EXACT <e>
restoring \1 to 4(4)..7
failed, try continuation...
7 <bcdefg> <__gh__> | 27: NOTHING
7 <bcdefg> <__gh__> | 28: EXACT <h>
failed...
failed...
All this stuff is covered in the perldebug man page. I suppose you
could do something with it to figure out the "complexity" of a regex.
--
Michael G Schwern schwern@pobox.com
http://www.pobox.com/~schwern
/(?:(?:(1)[.-]?)?\(?(\d{3})\)?[.-]?)?(\d{3})[.-]?(\d{4})(x\d+)?/i
==== Want to unsubscribe from Fun With Perl? Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
==== unsubscribe