[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] efficiency & memory (LONG)



On Sat, 22 Aug 1998, Vincent Nonnenmacher wrote:

> >But no matter what, MacPerl has no excuse for crashing when it runs of
> >memory or stack space or whatever.  Good Programs don't do that, they
> 
> instead if saying that could you pin point and narrow a sample case in
> which you could exhibit a nasty macPerl behaviour ?
> armed with good and precise example one could make macPerl safer but how
> such a jewel could deal with bad code and outrgeous memory usage ?

well, sort of.  I'm afraid I've just now found that I deleted the old
droplet.  Maybe it will turn up somewhere else.  But I have the fixed
version, which is close.

> are you over abusing foreach list constructs ?

No.

> are you gulping an entire file in memory to read it line by line ?

Yes.  No.  Maybe.  I don't know how to answer that.  (obscure movie quote) 

Part of the problem the Perl script was solving was changing line endings,
and line endings are part of the input file format (a record is terminated
by a new line, which we can even assume is \n by the time the RE hits it). 
But some of the fields I'm extracting are multiline fields, so I do have
to read in, well, the whole file.  And embedded line endings have to be
converted as well as database-specific line endings. 

Consider, if you will, the simple comma-delimited values file:


"field 1","field 2","field 3","last field"\n
"2nd record fld1","r2f2","r2f3","r2f4"\n
"3rd record fld1","notice embedded ""quotes"" are allowed","r3f3","r3f4"\n
"4th record fld1","so are embedded newlines\n
which can make your life\n
complicated","r4f3","r4f4"\n
"5th record fld1","blank fields have no quotes",,\n
25,"note that pure numbers and possibly some strings\n
can be represented without quotes, as in field 1 here","r6f3","r6f4"\n


In addition, I want to be able to change how many fields are in each
record (since this has already changed many times).  This means that the
same RE has to look for either the comma or a new line (since it doesn't
know which it will be seeing), but the logic must handle them differently.

One simplification I have made is that the only bareword/barenumber is the
first field in each record; all other fields except blank ones use quotes. 

Lest that seem too easy, I wanted to store each parsed record in a hash
using the name of the record, which is the second field.

I'll leave writing the regular expression to handle this as an exercise
for the reader.  :)  Is there a CPAN module that tames this hairy monster?
I'm pretty sure there wasn't one nearly a year ago.

The regular expression that crashed MacPerl had lookahead statements like
(?=\") so I could tell whether a quote was ending the field or was
announcing an embedded quote.  It might have looked something like this:

while ( /^"(.+?)"[,\n](?=["\d])/s)     # Don't try this at home, kids!

where this _while_ statement was stripping off a field at a time, and
trying futilely to avoid tripping over the wrong end-of-lines and such. 
(To those scratching their heads, I did set $_ = $' so that we were
continually at the beginning of the string.  Since I was already using
lookaheads, it didn't seem like this was any additional penalty.)

I worked around the problem by redefining the input file.  :)
I can actually change the delimiters, which I did to [ instead of " and *
instead of , .  I chose these because I could guarantee there were no
occurrences of [ inside the fields; and since this wasn't C/C++ code, few
*'s.  The RE is laughably simple now, and I just have to set Perl's memory
partition to 15 MB or so to process one 1.5 MB file.  I should note that 
if I don't set the memory partition properly with the simple RE, it does 
politely tell me "Out of memory."  It does its job.

If anyone has ideas about how to avoid undef'ing $/, I'm open.  But that's
not Mac-specific, I guess, while MacPerl crashing is.  I suppose you could
read it a line at a time, and concat lines if you determined they were
continuations.  But you would have to know it was a continuation, since
the syntactical meaning of commas and quotes changes when it is.


> I'm using Perl and macperl for years and never complained about memory
> problem I always find a way to diminish the burden on perl when going in
> front of the wall. Remember that to conquest is facilitated by division.

Lovely.  But just because you've found ways around the bugs doesn't mean
they shouldn't be fixed, does it?  I, too, worked around them.  But in the
future, I would prefer graceful exits to reboots.  In particular, since
the regular expressions I was writing were legal Perl, I didn't like the
icy hand of the debugger across the face (sorry, too many metaphors in my
soup).  Not like, say, a pointer out of my program space in C, which is
semantically illegal, and which I accept my duty to correct.

In fact, I would love to get help (therapy?) for my ailing scripts; but
unfortunately, the charter of the mailing list seems to force my
complaints into one forum but suggestions for solutions into another. 
Oop--don't respond to that, since Matthias doesn't want me picking fights! 
;^)  Really, though, I like MacPerl, and hope to use more the Mac Toolbox
features soon, in a Web data-harvesting Mac app.  But MacPerl can get
gloriously better--what is more exciting, it seems to be headed that way!

Such was the sad tale of my crash-the-MacPerl script.  I'm now in the
happily-ever-after (of that script)--I'm glad it's not on CPAN.  ;)
No doubt, after seeing small pieces of the code, you are too.



--
MattLangford 





***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch