On Tue, Sep 07, 1999 at 05:59:21PM -0500, D@i-works.com wrote: > That's perhaps intractable, but at least well defined. Now > a more interesting question: if a string is known to match > two REs, then is there a sense in which one of the matches > is "better", i.e. more specialized? How do you go about > defining it? (If a string matches q/a.*/ that's intuitively > more information than q/.*/.) > > I'll just go with length of the RE, sense be damned. But > better ideas are appreciated, Well, you can always play with the various regex debugging commands (if you happen to have a perl compiled with -DDEBUGGING, every real perl hacker should) using the -Dr flag which gives you the pre-compiled form of the regex, its 'size' and a verbose explaination of how its 'seen' by perl. All sorts of fun information, and that's just the compile time stuff. compiling RE `[bc]d(ef*g)+h[ij]k$' size 43 first at 1 1: ANYOF(11) 11: EXACT <d>(13) 13: CURLYX {1,32767}(27) 15: OPEN1(17) 17: EXACT <e>(19) 19: STAR(22) 20: EXACT <f>(0) 22: EXACT <g>(24) 24: CLOSE1(26) 26: WHILEM(0) 27: NOTHING(28) 28: EXACT <h>(30) 30: ANYOF(40) 40: EXACT <k>(42) 42: EOL(43) 43: END(0) anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) stclass `ANYOF' minlen 7 When the regex is actually run it will show you all the steps the regex is performing, the backtracking, etc... Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__' Setting an EVAL scope, savestack=3 2 <ab> <cdefg__gh_> | 1: ANYOF 3 <abc> <defg__gh_> | 11: EXACT <d> 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} 4 <abcd> <efg__gh_> | 26: WHILEM 0 out of 1..32767 cc=effff31c 4 <abcd> <efg__gh_> | 15: OPEN1 4 <abcd> <efg__gh_> | 17: EXACT <e> 5 <abcde> <fg__gh_> | 19: STAR EXACT <f> can match 1 times out of 32767... Setting an EVAL scope, savestack=3 6 <bcdef> <g__gh__> | 22: EXACT <g> 7 <bcdefg> <__gh__> | 24: CLOSE1 7 <bcdefg> <__gh__> | 26: WHILEM 1 out of 1..32767 cc=effff31c Setting an EVAL scope, savestack=12 7 <bcdefg> <__gh__> | 15: OPEN1 7 <bcdefg> <__gh__> | 17: EXACT <e> restoring \1 to 4(4)..7 failed, try continuation... 7 <bcdefg> <__gh__> | 27: NOTHING 7 <bcdefg> <__gh__> | 28: EXACT <h> failed... failed... All this stuff is covered in the perldebug man page. I suppose you could do something with it to figure out the "complexity" of a regex. -- Michael G Schwern schwern@pobox.com http://www.pobox.com/~schwern /(?:(?:(1)[.-]?)?\(?(\d{3})\)?[.-]?)?(\d{3})[.-]?(\d{4})(x\d+)?/i ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe