Hi, Dave Babbitt wrote: > > Hi Guys! > > I'm having trouble extracting patterns from HTML. I am also confused with > all the built-in $ stuff. All I want to do is create a key-value > associative list with the anchor names as the key and the text between them > as the value. Then I want to be able to search through the values and > return the key if I have found what I am looking for. I can get the first > "<a name=blah>" at the beginning of a $' by using $_ =~ /(<body[^>]*>[^<a > ]*)/i; but how do I do the rest? The html would look like this: > [sample deleted] > Can anybody help? Hope so ... What about this one: #!/usr/local/bin/perl -w open(FILE,"test.dat") or die "Oops!\n"; @text = <FILE>; close(FILE); $text = join('',@text); ## here $text contains the whole file $text =~ s/^.*?<body.*?>(.+?)$/$1/si; ## remove anything from ## beginning to '<body...>', ## if you insist doing this while($text =~ m|<a name=(.+?)>(.*?)</a>|gsi) ## that's all ... { print "name: $1\n"; print "value: $2\n"; } Note some things here: 1) 'while' in combination with the "global" option ('/../g') iterates over the string and gives you all the matches. 2) You must use the single-line option 's', so that the dot '.' matches newlines '\n', too. 3) The option 'i' is recommended here as HTML is case-insensitive regarding the tags. 4) The use of non-greedy search (note the '?'s in patterns) is important here. Bye, Eike -- ====================================================================== Eike Grote, Theoretical Physics IV, University of Bayreuth, Germany ---------------------------------------------------------------------- e-mail -> eike.grote@theo.phy.uni-bayreuth.de WWW -> http://www.phy.uni-bayreuth.de/theo/tp4/members/grote.html http://www.phy.uni-bayreuth.de/~btpa25/ ====================================================================== ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch