[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

re: [MacPerl-WebCGI] pattern matching in html files



###  Robert Terwilliger wrote:

# > If I have an html file and I want to reduce "$string," which contains the
# > whole text of the file, to just the content of the <h1>'s, what do I do?
# > I have tried the following thus far but trial and error 

###  RTFM is better than try & error! (Camel book, about greedy and nongreedy matching)

# >  have thus far not provided me with an answer.
# >       if ($string =~ m%<h1>(.*)</h1>%)ig {
# >           $string = $1  }
# >  It seems that this would only grab the first <h1>, not all of them.  How do
# >  I grab all of them?

###  Test your script before you post it -- 
###     that expression should read ... ($string =~ m%<h1>(.*)</h1>%ig),
###  and post a full example -- like this: 

###  Given some string ...

   $_ = "img <a href=nix><h1> >>> Pe</h1><h1>rl</H1> oink <h3> oink %&$§<<<<< GARBAGE <H1> for</h1> NONSENSE <h1> per</h1><h1>fection! <<<</h1> fooo ";

###  ... you can extract in this way:

   while (/<h1>(.*?)<\/h1>/ig) { print $1 }

###  Comments:
###    1. Do not forget the ?-mark to make the .* search pattern nongreedy:
###            to find as little characters as possible. 
###    2.  / /  searches within  $_ .
###
###
###                       _\|/_
###    Detlef Lindenthal    o o   <detlef@lindenthal.com>
###                         '










==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-webcgi-request@macperl.org