[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] Comparing Regular Expressions



On Tue, Sep 07, 1999 at 05:59:21PM -0500, D@i-works.com wrote:
> That's perhaps intractable, but at least well defined. Now
> a more interesting question: if a string is known to match
> two REs, then is there a sense in which one of the matches
> is "better", i.e. more specialized? How do you go about
> defining it? (If a string matches q/a.*/ that's intuitively
> more information than q/.*/.)
> 
> I'll just go with length of the RE, sense be damned. But
> better ideas are appreciated,

Well, you can always play with the various regex debugging commands
(if you happen to have a perl compiled with -DDEBUGGING, every real
perl hacker should) using the -Dr flag which gives you the
pre-compiled form of the regex, its 'size' and a verbose explaination
of how its 'seen' by perl.  All sorts of fun information, and that's
just the compile time stuff.

        compiling RE `[bc]d(ef*g)+h[ij]k$'
         size 43 first at 1
            1: ANYOF(11)
           11: EXACT <d>(13)
           13: CURLYX {1,32767}(27)
           15:   OPEN1(17)
           17:     EXACT <e>(19)
           19:     STAR(22)
           20:       EXACT <f>(0)
           22:     EXACT <g>(24)
           24:   CLOSE1(26)
           26:   WHILEM(0)
           27: NOTHING(28)
           28: EXACT <h>(30)
           30: ANYOF(40)
           40: EXACT <k>(42)
           42: EOL(43)
           43: END(0)
         anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
                                           stclass `ANYOF' minlen 7

When the regex is actually run it will show you all the steps the
regex is performing, the backtracking, etc...


         Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
           Setting an EVAL scope, savestack=3
            2 <ab> <cdefg__gh_>    |  1: ANYOF
            3 <abc> <defg__gh_>    | 11: EXACT <d>
            4 <abcd> <efg__gh_>    | 13: CURLYX {1,32767}
            4 <abcd> <efg__gh_>    | 26:   WHILEM
                                       0 out of 1..32767  cc=effff31c
            4 <abcd> <efg__gh_>    | 15:     OPEN1
            4 <abcd> <efg__gh_>    | 17:     EXACT <e>
            5 <abcde> <fg__gh_>    | 19:     STAR
                                    EXACT <f> can match 1 times out of 32767...
           Setting an EVAL scope, savestack=3
            6 <bcdef> <g__gh__>    | 22:       EXACT <g>
            7 <bcdefg> <__gh__>    | 24:       CLOSE1
            7 <bcdefg> <__gh__>    | 26:       WHILEM
                                           1 out of 1..32767  cc=effff31c
           Setting an EVAL scope, savestack=12
            7 <bcdefg> <__gh__>    | 15:         OPEN1
            7 <bcdefg> <__gh__>    | 17:         EXACT <e>
              restoring \1 to 4(4)..7
                                           failed, try continuation...
            7 <bcdefg> <__gh__>    | 27:         NOTHING
            7 <bcdefg> <__gh__>    | 28:         EXACT <h>
                                           failed...
                                       failed...


All this stuff is covered in the perldebug man page.  I suppose you
could do something with it to figure out the "complexity" of a regex.

-- 

Michael G Schwern                                           schwern@pobox.com
                    http://www.pobox.com/~schwern
     /(?:(?:(1)[.-]?)?\(?(\d{3})\)?[.-]?)?(\d{3})[.-]?(\d{4})(x\d+)?/i

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe