[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [Fun With Perl] What does this do? (SPOILERS)




> Ok, so now I know what this does, will someone explain what is going on
> here?

* * * SPOILERS * * * 




OK, the example was

	print "\L\uHELLO, \uWORLD!\n";

which prints
	           Hello, world!

Everyone who answered was surprised by something about this, although
not everyone was surpseid by the same things.  Some people were
suprised that the H was capital, and others were surprised that the w
was lowercase.

When Perl sees something like

	"\LFOO"

it compiles it as if you had written 

	lc("FOO")

and similarly:

	"\LF\uOO"

compiles as if you had written

	lc("F" . ucfirst("OO"))

This is documented; if yo ulook up `lc' in `perlfunc', you'll see that
it says:

        This is the internal function implementing the \L escape in
        double-quoted strings.

So if you have a little knowledge, you will expect that

	print "\L\uHELLO, \uWORLD!\n";

will compile as

	print lc(ucfirst("HELLO, " . ucfirst("WORLD!\n")));

and the effects of the \u's will be lost.  And, in general, this
analysis is correct, and they *are* lost.

But there's something bizarre happenning here instead.  In the source
code for Perl's lexer, which is the very lowest level of the parser,
the part that actually looks at the program one character at a time,
is the following line:

	    if (strnEQ(s, "L\\u", 3) || strnEQ(s, "U\\l", 3))
		tmp = *s, *s = s[2], s[2] = tmp;	/* misordered... */

This is in the vicinity of line 1769 of toke.c, if you want to look.
`strnEQ' compares two strings for equality.  What this is doing is
looking to see if you have \L\u; if you do, it pretends that you wrote
it in the other order, as \u\L.  Similarly if it sees \U\l, it
pretends that you wrote \l\U instead.  That means that when it's
parsing this:

	print "\L\uHELLO, \uWORLD!\n";

it pretends that you wrote it as this instead:

	print "\u\LHELLO, \uWORLD!\n";

and then it compiles it like this:

	print ucfirst(lc("HELLO, " . ucfirst("WORLD!\n")));

So the effect of the second \u is lost, but the effect of the first
one isn't.  

For the record, I am not a toke.c expert.  I discovered this by
accident some time early in 1996, and you can see how surprised I was
at the time:

: I had a program in which a variable contained a town name in all caps.
: I wanted to capitalize only the first letter when I interpolated it
: into a double-quoted string.
: 
: I tried
: 
:       \L\u$town_name\E
: 
: which I knew wouldn't work, but it *did* work---I got the town name in
: lowercase with initial capital letter.

Here was Larry's reply:

# Fancy that.  :-)

The people who suggested that this is a bug are off the mark here.  It
may be an inadvisable feature, or an incomplete feature, or a puzzling
feature, or a naughty naughty feature, but to call it a bug implies
that it was an accident, and that is certainly not the case.


==== Want to unsubscribe from this list? (Don't you love us anymore?)
==== Well, if you insist... Send mail with body "unsubscribe" to
==== fwp-request@technofile.org