[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Extracting elements in a HTML page.




Dear Dave & other MacPerl'ers,

A week or so back, Dave Johnson inquired about using HTML::Parser.

There wasn't much discussion on list, and possibly not much interest, but
if anyone's still having any trouble using the Parser, or with making sense
of subclasses, the following might be of some value.

I wrote a script and a module (HTMLDump.pl, HTMLDump.pm) which dumps out
the complete parse of an html file in (I hope) a fairly readable format.
The code serves as a (simple) working example of an HTML::Parser subclass
and client script, and the output of HTMLDump can be helpful in debugging
your own application of the Parser - it helps to know what stream the
Parser is generating (or not generating) for a given HTML file or
construct.  It's especially useful (and revealing) if you're dealing with
javacript <script> data, which sometimes parses in surprising and
inconsistent ways.  It also shows you how 'text' chunks sometimes get
broken up at arbitrary points by Parser.

If anyone could use a copy of this utility (.pl, .pm, droplet), just send
me a note.

rkm

FYI> Sample output below - HTML followed by parse-dump, from javacript3
documentation.


----------------------------------------------------------------------------
----------------------------------------------------------------------------
<title>Button Bar</title>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000"
ALINK="#FF0000">
<form name="buttonbar">
<center>
<script>
// state 1 is contents showing, state 2 is no frame, state 3 is index showing
if (parent.state == 1) {
   document.write('<input type="button" name="hc" value="Hide Contents"
onClick="parent.state=2;
parent.frames[0].location=parent.frames[0].frames[1].location;
history.go(0)">')
   document.write('<input type="button" name="si" value="Show Reference"
onClick="parent.state=3;
parent.frames[0].frames[0].location=\'alpha.html\'; history.go(0)">')
}
else if (parent.state == 2) {
   document.write('<input type="button" name="sc" value="Show Contents"
onClick="parent.state=1; parent.frames[0].location=\'content.html\';
history.go(0)">')
   document.write('<input type="button" name="si" value="Show Reference"
onClick="parent.state=3; parent.frames[0].location=\'content.html\';
history.go(0)">')
}
else if (parent.state == 3) {
   document.write('<input type="button" name="hi" value="Hide Reference"
onClick="parent.state=2;
parent.frames[0].location=parent.frames[0].frames[1].location;
history.go(0)">')
   document.write('<input type="button" name="sc" value="Show Contents"
onClick="parent.state=1; parent.frames[0].location=\'content.html\';
history.go(0)">')
}
</SCRIPT>
</form>
----------------------------------------------------------------------------
----------------------------------------------------------------------------
HTMLDump of document "g3:WorkingSet:Perl:Chris.log:javascript3:navbar.html"

<<HTMLDump>>

<title>
    text    "Button Bar"
</title>
    text    "\n"
<body>
    Attr->{bgcolor} => "#FFFFFF"
    Attr->{text} => "#000000"
    Attr->{link} => "#0000FF"
    Attr->{vlink} => "#FF0000"
    Attr->{alink} => "#FF0000"
    text    "\n"
<form>
    Attr->{name} => "buttonbar"
    text    "\n"
<center>
    text    "\n"
<script>

text    "\n// state 1 is contents showing, state 2 is no frame, state 3 is
index showing\nif (parent.state == 1) {\n   document.write('"

<input>
    Attr->{type} => "button"
    Attr->{name} => "hc"
    Attr->{value} => "Hide Contents"
    Attr->{onclick} => "parent.state=2;
parent.frames[0].location=parent.frames[0].frames[1].location;
history.go(0)"

    text    "')\n   document.write('"
<input>
    Attr->{type} => "button"
    Attr->{name} => "si"
    Attr->{value} => "Show Reference"
    Attr->{onclick} => "parent.state=3;
parent.frames[0].frames[0].location=\'alpha.html\'; history.go(0)"

    text    "')\n}\nelse if (parent.state == 2) {\n   document.write('"
<input>
    Attr->{type} => "button"
    Attr->{name} => "sc"
    Attr->{value} => "Show Contents"
    Attr->{onclick} => "parent.state=1;
parent.frames[0].location=\'content.html\'; history.go(0)"

    text    "')\n   document.write('"
<input>
    Attr->{type} => "button"
    Attr->{name} => "si"
    Attr->{value} => "Show Reference"
    Attr->{onclick} => "parent.state=3;
parent.frames[0].location=\'content.html\'; history.go(0)"

    text    "')\n}\nelse if (parent.state == 3) {\n   document.write('"
<input>
    Attr->{type} => "button"
    Attr->{name} => "hi"
    Attr->{value} => "Hide Reference"
    Attr->{onclick} => "parent.state=2;
parent.frames[0].location=parent.frames[0].frames[1].location;
history.go(0)"

    text    "')\n   document.write('"
<input>
    Attr->{type} => "button"
    Attr->{name} => "sc"
    Attr->{value} => "Show Contents"
    Attr->{onclick} => "parent.state=1;
parent.frames[0].location=\'content.html\'; history.go(0)"

    text    "')\n}\n"
</script>
    text    "\n"
</form>
    text    ""
    text    "\n\n"


<</HTMLDump>>

----------------------------------------------------------------------------
----------------------------------------------------------------------------


===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org