[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] LWP::Simple - conclusion



Ok, here is the conclusion to the LWP::Simple posting.  :-)

Recap:

My wife decided she wanted to bet at the dog races.  I
volunteered to write a small database program for her to
keep track of how well the dogs did.  To do this I had to
download the history files from http://gulfgreyhound.com.
I wrote a small program to do so using LWP::Simple's GET
function since, as Paul pointed out to me a while back -
why re-invent the wheel.  Meaning, LWP::Simple can download
web pages easily so why go through doing all of the SOCKET
stuff?  Anyway.....

So I wrote this program and ran it after downloading a
couple of files and determining how the data looked.  At
the time, MacPerl had 8192K as it's two size fields.  The
program downloaded 867 files and then ran out of memory.
"Ok," I thought, "I'll just give it some more."  So I upped
the memory to 12mb.  Ran it again and got another out of
memory message after downloading only 820 files.  "That's
strange," I thought to myself as I doubled the memory to
24mb.  I ran it again.  Out of memory again and only 967
files were downloaded.  "This....is bad," I thought to
myself.

So off I went to try to figure out why the program did
this.  I began putting in debugging PRINT statements,
developed a history file, wrote out the new web pages I was
going to, and those I'd already visited.  I then further
modified the program so I could stop everything and dump
all of this information out to various files so I could
start the program back up at any point.  Then I noticed
that, when the program went whacko it began to write the
datasets out to the history file as well as printing them
to the correct file.  "Whatever is causing this is stomping
on memory," I thought.

So Friday night came and went.  Saturday, early, I got up
and continued working on this.  I made little headway but
finally turned to ZoneRanger.  ZoneRanger has two functions
which are fairly good: 1)A graphical dislay so you can
watch memory being used and freed, and 2)An option to watch
for memory leaks.  BTW:  ZoneRanger is free.  You can get
it from info-mac.

So Saturday fled by.  By this time I had thought that maybe
there was a variable in LWP which was being set as a global
variable.  So I searched through all of LWP but didn't find
any.  I did find a few BLESS statements which create a
class which did not have a MY statement on them but I left
them alone because they were probably needed.  I did find
though that most of the LWP and HTTP routines create a
$request and a $response.  The $request fetches the
information and the $response sends the information back.
So I wondered if the MY command was acting up.  I put in
some UNDEFs into the routines just before they returned and
just before the last statement.  I UNDEF'd the $request
variable just to see what would happen.  Well, ZoneRanger
reported that memory was handled better (ie: Things got
freed faster) but eventually the program ran out of
memory.  Which meant I wasn't any closer to a solution.

Sunday came.  I continued to explore the bowels of LWP,
HTTP, and other routines trying to locate the offending
code.  All to no avail.  By this time I had checked my
program repeatedly for syntax errors, and logic errors.  I
went through several of the files and could not find
anything wrong.  So finally, I moved the program over to my
Linux box and ran it there.  At around the 820th file
memory started to be chewed up.  The linux box has 128mb of
memory (64mb ram and 64mb swap space).  The program used
56% of the memory before it reached the 1200th file and
stopped.  (My limitation - it really hadn't finished
downloading files.)  I was overjoyed.  The same error on
two systems meant that there was some problem either in
Perl or LWP.  I couldn't find it - but I could reproduce
the error any time I wanted to.  So I wrote to Matthias and
went to bed happy.

But obviously my subconscious wouldn't give up on the
problem and sometime during the my sleep a little voice
told me what was wrong.  I woke up, thought about it, and
said a few cuss words - then went back to sleep.  The next
day proved what that little voice had said.  So I wrote
Matthias back again and said I'd post what happened.  The
problem is this:

When I downloaded the files from the site I checked them
out to see what kind of information they kept.  It was a
couple of title lines, a couple of lines of dashes, and
then just plain alphanumeric data.  Only the HTML web pages
contained HTTP:// commands and the site only used the
<a...>X..</a> links to the various sections of their web
site.  So I would split the incoming information on the
double quotes which surrounded the HTTP reference.  What I
didn't know was that the history files (those files which
were greater than 200K) an additional line of information
was placed into the files.  You guessed it - it was an
HTTP://www.gulfgreyhound.com line.

So the program was executing properly.  When one of these
files were downloaded the program would split it up by
double quotes (which there weren't any).  It would then do
a search via the

	if( $theDoc[$i] =~ /http:\/\//i ){

command and find that yes, indeed, there was an HTTP://
line in that ~200K file.  It would then dutifully stick the
entire file into the array for new places to go which would
then make it appear in the history file when the program
attempted to go to that web page.  Which is why it looked
as though the program was stomping over memory.  Since this
happened every single time a history file was downloaded,
memory would quickly be consumed.  There are somewhere
around 2000 files at this site with more being added every
day.

The answer to how to fix this is varied and many.  I chose
to resplit the information on whitespace since there can
not be any whitespace in an HTTP command.  But I could have
also just checked to see if the filename ended with ".txt"
also.  In either case though - LWP::Simple works fine.  It
was a programmer error.  :-)

For people just starting out - this is a good example of
just how long some of these programs can take to debug.  I
spent approximately 40 hours trying to get this program to
work and looking in the wrong place for the answer to why
the program didn't work.  This is not to say that you would
or should do the same thing but it is to say to hang in
there and keep trying.  Sometimes it just takes a while to
get a program to work properly.  :-)

So anyway - that's the story of LWP::Simple!  Have fun!
I'm off to the next part of this program (and probably a
lot more mistakes!).  :-)


***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch