[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Iterative Implementation of a Tail-Recursive$POST_MATCHed Split of a Slurped HTML Text



Dave Babbitt <babbitt@airmail.net> writes:
}Hi Guys!
}
}I'm having trouble extracting patterns from HTML. I am also confused with
}all the built-in $ stuff. All I want to do is create a key-value
}associative list with the anchor names as the key and the text between them
}as the value. Then I want to be able to search through the values and
}return the key if I have found what I am looking for. I can get the first
}"<a name=blah>" at the beginning of a $' by using $_ =~ /(<body[^>]*>[^<a
}]*)/i; but how do I do the rest? The html would look like this:
[snip]
}
}
}Can anybody help?

I *think* I see what you're trying to do.  Rather than trying to build the
regexps myself, I'd probably use HTML::Parser, part of libwww-perl-5.  (The
MacPerl version is at
<ftp://mors.gsfc.nasa.gov/pub/MacPerl/Scripts/libwww-perl-5.08.sit.hqx>,
but the HTML part isn't changed, so you can get them from CPAN if you
want).  I've only used HTML::LinkExtor, myself, but HTML::Parser looks to
be completely general, and you should be able to get what you want out of
it.  The docs are included in the Perl files themselves, so point Shuck at
the *.pm's to learn more about how to use them.  Also look at lwpcook.pod,
the libwww-perl-5 cookbook.

}
}Thanx
}
}Dave
}
}Stop medicating your pity with other people's money! Compassion = "com"
}(with) + "passion" (suffer). If you are not "suffering with" those you
}are trying to help, why are you congratulating yourself?
}
}
}
}***** Want to unsubscribe from this list?
}***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch


---
Paul J. Schinder
NASA Goddard Space Flight Center
Code 693, Greenbelt, MD 20771
schinder@pjstoaster.pg.md.us



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch