[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] - Regep question



tkimpton@maned.com, mac-perl@iis.ee.ethz.ch, design@netpass.com
Subj:	Re: [MacPerl] - Regep question

Tom Kimpton writes recently to Chris Brinegar:
![stuff deleted]
!>
!>while (<>) {
!>  if (/\Q<\/font><\/font><\/form>\E/) {
!>    $puts = 1;
!>  }
!>  elsif (/(?s)\Q<br>\E\n\n\Q<br>\E\n/) {
!>   $puts = 0;
!>  }
!>
![stuff deleted]
!>
!>what I'm searching for is 
!>
!>"<br>(newline)(newline)<br>(newline)"
!>
![stuff deleted]
!
!
!You are matching on $_, which is being set by the <>, but, it's
!being set to a *single* line (everything up to and including the newline),
!and you're trying a match that spans multiple lines.  I believe
!I've seen a way to do it, but, I couldn't find it in my brief
!perusal of the manual.

Yes that is exactly correct.  One way to get around this impasse 
would be to stuff the whole input into a $scalar like so:

  @lines = (<>);
  $whole_file = join('',@lines);
  if ($whole_file =~ /\Q<\/font><\/font><\/form>\E/) {
      $puts = 1;
  }
  elsif ($whole_file =~ /(?s)\Q<br>\E\n\n\Q<br>\E\n/) {
      $puts = 0;
  }

This will do what you want if there is at least one occurence of either 
string - but you don't care what the value of $puts is on repeated 
occurence(s) of one or the other match (which I suspect is the case with 
your script since before either match $puts is presumably undef() and 
after either match it retains the value 1 or 0 until the next contrary 
match).  In other words, if you wanted to have $puts == 1 or 0 throughout 
multiple chunks of an input file then the above may not do what you want, 
but perhaps you should have added an C<else { $puts = -1; }> default value.

The attempted use of (?s) is clever but unnecessary as the following 
could work equally well:

  elsif ($whole_file =~ /\Q<br>\E\n\n\Q<br>\E\n/) {

It certainly appears as though you are trying to parse html eh? In that case 
you might want (strictly for generalities sake) to add an //i to both regexes 
- which should be quick dirty and expensive (slow). If you *know* that the 
tags are lower case then leaving as is should be OK.  Rewriting in terms of 
char classes may give you a speed improvement if you do need 
case insensitivity (note the use of /x to allow line breaking regex):

 if ($whole_file =~ /\Q<\/[fF][oO][nN][tT]>
                     <\/[fF][oO][nN][tT]>
                     <\/[fF][oO][rR][mM]>\E/x) {
     $puts = 1;
 }
 elsif ($whole_file =~ /\Q<[bB][rR]>\E\n\n\Q<[bB][rR]>\E\n/) {
     $puts = 0;
 }

(BTW you were correct in that neither /m, /s, nor even /ms has 
 anything to do with this regex: you haven't used either ^ or $ 
 in either of them.)

I hope that helps you.

Peter Prymmer

P.S. 
yesterday was the 100th anniversary of the patenting of the 
Swiss Army Knife: Happy birthday to a great tool and 
happy Friday the 13th!


***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch