[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] efficiency & memory




On Thu, 20 Aug 1998, Nathaniel Irons wrote:

> On the mac I typically get dumped into the debugger when datafile sizes
> exceed about 500K, though I do see graceful out of memory errors in
> about 1 in 5 attempts.  Under Linux, the script completes calmly in a
> few seconds no matter how big the file is.

Yes, I have seen stack<->heap collisions in MacPerl, based on regular
expressions I was running.  Out of curiosity, do you use a low-level
debugger like MacsBug?  Does it report a bad heap?

I found it easier to change the regular expression.  It would be nice if
MacPerl gracely reported the problem rather than requiring a three-finger
salute, but for now...


<snippage>

> Tue Jul 14 20:00:18 1998 [28874] PASV

> #!perl -w
> 
> foreach $line (<>) {
> 
as someone else pointed out, you could use
  while (<>) {
      $line = $_;

here.  I'm not sure whether the first slurps in the whole file or not.  If
the script isn't working now, though, it's worth a try.


>     next if ($line =~ /^$/);
>     
>     $logID = $line;
>     $logID =~ s/^[^\[]*\[(\d{1,5}).*/$1/;

why the substitution?  why not
      $logID =~ /\[(\d+)\]/;

do brackets show up somewhere anywhere else in the line?
And, especially, do you care if the digits of the number are exactly in
that range?  Might they need to get bigger down the road?  The {m,n} is
expensive; don't use it because you can, use it because you absolutely
must.

To quote the Camel book (on Time Efficiency):
"Avoid regular expressions with many quantifiers, or with big {m,n}
numbers on parenthisized expressions.  Such patterns can result in
exponentially slow backtracking behavior unless the quantified subpatterns
match on their first 'pass'."

I'm not saying this is Wrong, just that it is a Prime Candidate for
Suspicion.  And therefore the first thing in one of my programs that would
get pitched overboard to avoid crashing.  The errors I had in one of my
programs were due to evil (?!...) and (?=...) operators, and {m,n} seems
{kindler,gentler} than those, but one never knows.


> 
> #   $logID =~ s/[^\[]*(\d{1,5).*/$1/;   #better   
> 
> #   print "logid is \"$logID\"\n";
> 
>     push @master_lines, $line;
>     push @ids, $logID;
> 
> }
> 
> @sorted = @master_lines[ sort byID  0 .. $#ids];

Does this actually sort the lines?  Does it work on a very small file?
If it does, I guess TMTOWTDT.

I can follow sorting the index numbers, but I'm not sure what happens when 
the assignment @sorted = @master_lines[ @sortedindices ] gets evaluated.

What if you replaced this line with

   @sorted_offsets = sort byID 0 .. $#ids;
   foreach $index ( @sorted_offsets )  {
       print "$master_lines[ $index ]\n";
   }

(and wipe out this line below, too, of course)?

> 
> print "sorted:\n\n @sorted\n";
> 
> sub byID {
> 
> #   $counter++;
> #   print "called $counter\n";
> #   print "called $ids[$a]\n";
>     $ids[$a] <=> $ids[$b];
> 
> }


Hope this helps.


--
MattLangford 



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch