[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] Regex to remove whitespace and to capitalize



Tim Ayers wrote:
> 
> Hi,
> 
> I'm not even just another Perl hacker so bear with me please.
> 
> I have a bunch of English phrases that I want to normalize by removing
> spaces and underlines and capitalizing each word, e.g.
> 
>   'Hi there'           => 'HiThere'
>   'Top Of The Morning' => 'TopOfTheMorning'
>   '25 or 6 to 4'       => '25Or6To4'
> 
> I started with two regexes
> 
>   $x =~ s/(\b.)/\U\1/g;
>   $x =~ s/[\s_]+//g;
> 
> How unimaginative.
> 
> So I tried
> 
>   $x =~ s/(^.)|[\s_]+(\S)/\U\1/g;
> 
> but that didn't work. For example, 'Hi there' goes to 'Hihere'. I was
> surprised by the loss of the 't'. I would have more expected
> 'HiHhere'. What's going on?
> 
> Finally I tried
> 
>   $x =~ s/(?:^|[\s_]+)(\S)/\U\1/g;
> 
> which seems to do the trick. Does anyone have an improvement or any
> caveats about this regex?
> 

This regex does not delete trailing spaces or underscores, which
your first regex pair does. I don't know if this is a problem for
your data or not...

Also there are slight differences in what gets capitalized, maybe best
pointed
out by asking what the correct answer should be for the following
two cases:

	'good-bye' => 'Good-bye' or 'Good-Bye'
	'one_two'  => 'OneTwo' or 'Onetwo'

These two cause major changes in code...

Rick

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe