[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[FWP] innermost first parsing

To: fwp@technofile.org
Subject: [FWP] innermost first parsing
From: Jeff Pinyan <jeffp@crusoe.net>
Date: Wed, 2 Aug 2000 11:34:55 -0400 (EDT)

Given text like (indented for sanity's sake):

  BEGIN
    alpha
    BEGIN
      beta
      gamma
      BEGIN
        delta
      END
      epsiolon
      BEGIN
        zeta
        eta
      END
      theta
    END
  END

If you want to find the first piece of data between BEGIN and END that
does NOT have another BEGIN ... END set in it, you can use the following
"unrolling the loop" style regex:

  ($first) = $text =~ m{
    BEGIN
    (
      [^BE]*  # 'B' and 'E' are first chars of tags
      (?:
        (?:
          B+ (?! EGIN )  # match /B+/ if NOT 'BEGIN'
          |
          E+ (?! ND )    # match /E+/ if NOT 'END'
        )
        [^BE]*
      )*
    )
    END
  }xs;

This has worked for the stress-testing I've done.

The application is:

  while ($text =~ s/REGEX/SOMETHING/) {
    $count++;
  }

so that the inner-most matches are dealt with first.

-- 
Jeff "japhy" Pinyan     japhy@pobox.com     http://www.pobox.com/~japhy/
PerlMonth - An Online Perl Magazine            http://www.perlmonth.com/
The Perl Archive - Articles, Forums, etc.    http://www.perlarchive.com/
CPAN - #1 Perl Resource  (my id:  PINYAN)        http://search.cpan.org/


==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe

Follow-Ups:
- Re: [FWP] innermost first parsing
  - From: merlyn@stonehenge.com (Randal L. Schwartz)

Prev by Date: Re: [FWP] The Perl Golf Apocalypse Experience
Next by Date: Re: [FWP] innermost first parsing
Prev by thread: [FWP] a quick break from golf
Next by thread: Re: [FWP] innermost first parsing
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net