[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Iterative Implementation of a Tail-Recursive $POST_MATCHed Splitof a Slurped HTML Text



Hi Guys!

I'm having trouble extracting patterns from HTML. I am also confused with
all the built-in $ stuff. All I want to do is create a key-value
associative list with the anchor names as the key and the text between them
as the value. Then I want to be able to search through the values and
return the key if I have found what I am looking for. I can get the first
"<a name=blah>" at the beginning of a $' by using $_ =~ /(<body[^>]*>[^<a
]*)/i; but how do I do the rest? The html would look like this:

<html>
	<head>
		<title>
			Anchor Test</title>
		</head>
	<body bgcolor=white>
		<a name=cover>
			<p>
				"Cover Page"</p>
			</a>
		<a name=toc>
			<p>
				"Table of Contents"</p>
			</a>
		</body>
	</html>

This perl doesn't work:

#!/usr/local/bin/perl -0777

open (SFILE, 'anchorTest.htm') || die "Sorry, can't open file! $!\n";
while (<SFILE>) {
	$_ =~ /(<body[^>]*>[^<a ]*)/i;
	split (/\<a name=\"*([^\>\"])\"*\>/i, $');
	print "\n$`~$&~$'\n";
	foreach $_(@_) {
		($ANAME, $ATEXT) = split (/\<a name=\"*([^\>\"])\"*\>/i, $_);
		print "$ANAME:$ATEXT\n";
	}
}
close (SFILE);


Can anybody help?

Thanx

Dave

Stop medicating your pity with other people's money! Compassion = "com"
(with) + "passion" (suffer). If you are not "suffering with" those you
are trying to help, why are you congratulating yourself?



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch