[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Extracting elements in a HTML page.



At 7:32 AM -0400 4/21/99, Chris Nandor wrote:
>At 18.08 -0400 1999.04.20, Dave Johnson wrote:
>>I need to extract specific elements of a Web page. It appears that
>>HTML::Parser would be ideal for this. But I'm afraid to admit that
>>despite three Perl books, as many days and who knows how many pod
>>files and web searches, I can't figure out how to make it work.
>>
>>Can anyone point me to a couple of examples using HTML:Parser?
>
>Well, Perl Cookbook has some examples.  It's not the easiest module to get,
>and the documentation is not very clear.  Maybe if you asked how to do
>something specific, someone could give you a step in the right direction.
>
Chris;

I have been using an Application called "WebMiner" with AppleScript. It
allows you to download a web page and then extract data from specific elements
such as (in AppleScript):

set ClosePrice to the contents of cell 8 of tabel 2 of theDoc

There is additional processing I wish to do to the data and PERL would be
a better choice for this additonal processing then AppleScript. My first
thought was to use your "Glue" to control WebMiner. However it then occurred
to me that once I have retrieved a page, Perl should be at least as
good at extracting the information I need as WebMiner. Once I stumbled
on HTML::Parser I thought most of the hard work was done.

Although I have figured out how to use some of the simpler methods, OOPs is
not my strong suit. So the documentation on HTML::Parser doesn't make any
sense to me. I've read chapter 20 of "Perl Cookbook" at least 10 times and
the examples seam to be of the type "This is the wrong way to do it" or "here
is a tease the solution is up to the student"

If there was an example that showed how to extract a single paragraph from
a webpage I think I could figure out the rest. But my searches for examples
have come up dry.

Thanks for your time.

Dave





===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org