[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] duh: A small puzzler



> People asked for simpler puzzles; here's a small one. Perl gurus
> can't play (or play quietly while the more lurkers consider :-)

Thank you! What fun is 'Fun With Perl' if, like me, you can't
understand anything anyone's posting?

> I did this today; I wasn't thinking... (well, it's Friday after all)
>
>    # records look like
>    # >12|c2169.2750 CHROMAT_FILE: 12=c2169.2750 PHD_FILE: ...
>    #
>    # I want to get just the c2169.2750 part
>
>    ($id) = split(/\s+/, $record);
>
>    $id =~ s/^.*|//;    # remove extraneous stuff before the construct info
>
> what's wrong? :-}

You're attempting to match a vertical bar, but because the vertical
bar is a regular expression metacharacter, your pattern is not
matching what you expect it to match. Simply escaping the special
meaning of the vertical bar with a backslash fixes the problem:

    $id =~ s/^.*\|//;

(The vertical bar is the regex alternation operator.)

So what does the regular expression pattern

    /^.*|/

attempt to match? And what does it succeed in matching? It attempts
to match either (1) any character (except newline) any number of
times (including zero) at the beginning of $id; or (2) the empty
string anywhere within $id. The first alternative matches successfully,
of course, so the entire string is matched and replaced with the
substitution string, which is the empty string. The expression

    $id =~ s/^.*|//;

is effectively the same as

    $id =~ s/.*//;

which is just a fancy way of writing

    $id = '';

;-).

If you intend to match everything from the beginning of the string
through the FIRST vertical bar in the string, I think it makes
sense to use the minimal-matching star quantifier *?. I believe it
prevents a lot of unnecessary backtracking:

    $id =~ s/^.*?\|//;

(I could be wrong on this point. Remember: This is a non-guru
responding.)

Or you could just be more explicit about saying "match anything
other than a vertical bar zero or more times at the beginning of
$id, followed by a vertical bar":

    $id =~ s/^[^|]*\|//;

This I'm SURE will prevent backtracking. (At least I think I'm
sure.)

If you're only carving out that substring and you're throwing away
the rest of the record, you really only need to apply one regular
expression to it, like this:

    my $record = '>12|c2169.2750 CHROMAT_FILE: 12=c2169.2750 PHD_FILE:';
    my $id = (split /[|\s]/, $record)[1];  # $id is now 'c2169.2750'

By the way, the plus quantifier in this expression is superfluous:

    ($id) = split(/\s+/, $record);

Hey! Just UNDERSTANDING something for once can be fun, too!

-- 
Jim Monty
monty@primenet.com
Tempe, Arizona USA

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe