[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Cursed variables (Was: Easy regex?)



At 19.22 -0000 98-12-20, Robert Crews wrote:
>Hmm. . . You're right. I remembered the regex speed rules wrongly. Both
>Hall and Schwartz's in _Effective Perl Programming_ and Friedl in
>_Mastering Regular Expressions_ note that match variables (those cursed
>three: $`, $&, and $') slow Perl scripts more than backreferences
>(variables $1 through $9). Friedl mentions that there's an additional
>speed penalty if you use more than nine backrefrences (variables $10
>through $99). However, Hall and Schwartz also mention that you should
>benchmark your regular expressions. So I did, and found that the
>"cursed" variables execute nearly a third faster than the backreferences:

Well, the thing with $`, $&, and $' is that for them to work, copies must
be made of all strings you use a regex on. Regardless of whether you use
those variables on that particular match, since the variables can be used
far away from the match.

If you're actually using one of them on every match of every regex, as you
do in your benchmarks, the copying has to be made anyway, and the cursed
code should be as fast as the 'blessed' code. (Or even faster, since perl
wont have to parse an '(' and an ')', and other small things.)

And since you match on one line at a time, the strings that has to be
copied aren't very long. If you tried matching on your whole history file,
I'd expect match_variables to be slower.

The speed penalty comes in when you somewhere in a program use $&, for
whatever good reason, and as a consequence slow down some other regex that
you run on heavy strings. This is especially true if the $& is in some
module that you have use'd, and not even visible in your own code.
'English' is one such module. (Or was. It might have been changed...)

So, if you are writing a simple run-once program, it's not that bad to use
$& et al. However, if you are writing a module, you should definitely take
the extra time needed to write (.*) instead of .*, and $1 instead of $&.

$1 .. $99 are much nicer than $&. Perl wont copy all of your strings just
because it finds one of them in your program, it will just copy things in
brackets, and it will do that whether you use $1 variables or not, because
you might do

my ($this, $that) = $var =~ /(this) and (that)/;

In this regex, $1 and $2 will never be longer than 'this' or 'that'. That's
not much comparing to &«, which can be almost as long as $var. And if $var
is several kilobytes long, that's several kilobytes that you really don't
need. That's a waste!

In your last benchmark, you don't just match, but substitute. That's
changing the original string, and that's more work.

Happy Perling - happy holidays
Cajo.

___Carl_Johan_Berglund_________________________
   Adverb Information
   carl.johan.berglund@adverb.se
   http://www.adverb.se/



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch