[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Easy regex?



Carl,

Hmm. . . You're right. I remembered the regex speed rules wrongly. Both
Hall and Schwartz's in _Effective Perl Programming_ and Friedl in
_Mastering Regular Expressions_ note that match variables (those cursed
three: $`, $&, and $') slow Perl scripts more than backreferences
(variables $1 through $9). Friedl mentions that there's an additional
speed penalty if you use more than nine backrefrences (variables $10
through $99). However, Hall and Schwartz also mention that you should
benchmark your regular expressions. So I did, and found that the
"cursed" variables execute nearly a third faster than the backreferences:

##################
#!perl -w

open(FIN, 'Critheis:System Folder:Preferences:Explorer:History.html');
open(FOUT, '>Critheis:Desktop Folder:History.html');

while (<FIN>) {
    if ( /HREF="(.*?)"/i ) {
        print FOUT "$1\n"
    }
}

exit(0);
##################
##################
#!perl -w
use Benchmark;

open(FH, 'Critheis:Desktop Folder:History.html');
@history = <FH>;
close(FH);

timethese(10000,
#    { match_variables => q{
#        for ( @history ) {
#            m#.*/#;
#            $desired_path = $&
#        }
#    }},
    { backreferences => q{
        for ( @history ) {
            m#(.*)/#;
            $desired_path = $1
        }
    }}
);

exit(0);
##################

I commented each one out in turn so as not to contaminate the test. Here
are the results I got:

Benchmark: timing 10000 iterations of match_variables...
match_variables: 52 secs (52.68 usr  0.00 sys = 52.68 cpu)

Benchmark: timing 10000 iterations of backreferences...
backreferences: 72 secs (74.03 usr  0.00 sys = 74.03 cpu)

What's that all about?

My hypothesis? $1 through $99 are not *backreferences*; they're just
additional forms of those cursed match variables: They can be used
anywhere in the Perl script, even several lines away from the regex,
which is the damning charge against $`, $&, and $'. True backreferences
are only available in the replace part of a search-and-replace regex in
true backreference notation: \1 through \99.

So I timed this

timethese(10000,
    { true_backreferences => q{
        for ( @history ) {
            s#(.*)/#\1#
        }
    }}
);

and got this

Benchmark: timing 10000 iterations of true_backreferences...
true_backreferences: 121 secs (121.82 usr  0.00 sys = 121.82 cpu)

Now I'm really confused.


Robert



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch