[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Memory problems in OO context



On Fri, 19 Nov 1999 18:35:04 GMT, bart.lateur@skynet.be (Bart Lateur) wrote:

> On Thu, 18 Nov 1999 01:34:09 -0600 (CST), Matthew Langford wrote:
>
>>To get the whole deal I had to use more than 55 MB of memory.  (I think
>>this is what I came back to; for a while I had to crank out some virtual
>>memory and set it higher.)  The hash, after I finally got it, takes up 4.6
>>MB. If I read in the file, I don't have too many memory problems.
>>Clearly, there are serious memory leaks here in the "getting" stage.
>
> That doesn't prove a thing. I tried HTML::TreeBuilder on a 15k HTML
> file, on a PC (DOS Perl), and I got an out of memory error, with many
> megabytes available for Perl.
>
> This HTML parsing seems to be very memory hungry. My guess is that this
> is because every HTML tag is a not so small object in memory. A lot of
> those objects, and you're quickly out of memory.
>
> Just give an acceptable proof that there IS indeed a memory leak, e.g.
> by parsing the same file over and over again, clearing memory (=
> releasing objects) every time, and watching it get out of memory after
> many loops.

True. And I remember having proved that maybe a year or two ago, and
reported it to the list. There was, as far as I remember, not much of a
reaction, though.

What I tried to do then is to loop over a bunch of files, strip all html
code from them ( HTML::FormatText) and dump the plain text into a single
text file.
The point is that I am looping over files, which should mean that I am
starting at point zero on every new loop, but still I lost several 100k on
each iteration. I tried to purposely undef all the variables and close the
filehandles at the end of each iteration, but to no avail.

The solution I came up with and which I am still using, is an extremely
ugly one: I wrote an AppleScript calling the droplet containing the Perl
script with a list of about 10 files on each iteration, closing MacPerl,
restarting it and doing a new iteration. It's ridiculous, but it has worked
for months, every day.

I don't know who is to blame, MacPerl or the HTML module(s) or both, but
there is no doubt that there are huge memory leaks in this area for years.

[later]

Ooops, after rereading the related pod, it seems I finally found a hint in
Gisle Aas' HTML::Element:

>BUGS
>
>If you want to free the memory assosiated with a tree built of
>HTML::Element nodes then you will have to delete it explicitly.  The
>reason for this is that perl currently has no proper garbage collector,
>but depends on reference counts in the objects.  This scheme fails because
>the parse tree contains circular references (parents have references to
>their children and children have a reference to their parent).

(maybe this note was not there in earlier versions of the module...[shrug])

There still remains a practical problem: What does this mean in relation to
HTML::FormatText?
Regarding the following snippet...

foreach $file (@ARGV){
	if (-f $file && $file !~/index\.htm./) {

		open (OUT, ">>$outfile") or die $^E;
		print OUT basename($file);

		$html = parse_htmlfile($file);
		$formatter = HTML::FormatText->new(leftmargin => 0, rightmargin =>
100000);
		$text = $formatter->format($html);
		undef $html;
		undef $formatter;

		#further GREP-wise cleaning-up snipped

		print OUT $text, "÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷\n\n";
		undef $text;
		close OUT;
	}
}

...I don't see how I could possibly free up more memory. Am I doing
something wrong here?
Ideas anybody?

I am in digest mode, so please cc me.



__Peter Hartmann ________

mailto:hphartmann@arcormail.de



# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org