[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] Regex to remove whitespace and to capitalize



>>>>> "Tim" == Tim Ayers <tayers@bridge.com> writes:

Tim> Hi,
Tim> I'm not even just another Perl hacker so bear with me please.

Tim> I have a bunch of English phrases that I want to normalize by removing
Tim> spaces and underlines and capitalizing each word, e.g.

Tim>   'Hi there'           => 'HiThere'
Tim>   'Top Of The Morning' => 'TopOfTheMorning'
Tim>   '25 or 6 to 4'       => '25Or6To4'

Tim> I started with two regexes

Tim>   $x =~ s/(\b.)/\U\1/g;
Tim>   $x =~ s/[\s_]+//g;

Tim> How unimaginative.

Tim> So I tried

Tim>   $x =~ s/(^.)|[\s_]+(\S)/\U\1/g;

Tim> but that didn't work. For example, 'Hi there' goes to 'Hihere'. I was
Tim> surprised by the loss of the 't'. I would have more expected
Tim> 'HiHhere'. What's going on?

Tim> Finally I tried

Tim>   $x =~ s/(?:^|[\s_]+)(\S)/\U\1/g;

Tim> which seems to do the trick. Does anyone have an improvement or any
Tim> caveats about this regex?

Tim> Thank you very much and
Tim> Hope you have a very nice day, :-)
Tim> Tim Ayers (tayers@bridge.com)
Tim> Norman, Oklahoma

I'm not sure if you define words as "non-blanks" or "sequences of
alphanumerics".  So I'll show you both:

non-blanks:

  $output = join "", map ucfirst($_), /(\S+)/g;

sequences-of-alphanumerics:

  $output = join "", map ucfirst($_), /(\w+)/g;

Take yer pick.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe