[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] poets mix problems



The 'perlre' and 'perlop' manpages are the places to start. Chapter 6 in the 
Perl Cookbook is also excellent.

For example Recipe 6.2 in the latter, "Matching Letters", is probably close 
to what you actually want. I don't wish to plagiarize, but in brief, you use 
the locale package (read 'perllocale'), and employ a RE like

	$string =~ /^[^\W\d_]+$/

\w matches an alphabetic, a digit, or underscore (_). \W is everything else. 
So to get alphabetics, we specify that we do NOT want the "everything else", 
or digits, or the underscore. The _first_ ^ and the $ anchor the match to 
the beginning and end of the string.

I'm not sure I understand why you would want a pattern that matches every 
character in your character set. :-)

In terms of extracting following strings, if you have 'wholetext' in 
variable $wholetext, and 'starting_string' in $starting_string, then the RE

	($matched) = ($wholetext =~ /$starting_string(.*)/);

will return everything after 'starting_string' in variable $matched. I have 
deliberately left the match greedy; you might wish to change that, or if you 
want to retrieve each following match for all occurrences of 
'starting_string', then you use the 'g' modifier.

Theoretically proper use of 'locale' and POSIX will account for Unicode, and 
your RE's should work. Mind you, some concepts like \b for a word boundary 
may not exist for a given script.

Arved Sandstrom

At 10:47 PM 2/26/00 +0200, miku wrote:

[ Brutal snippage ]

>More to the point, my problem is: if I do a pattern search, non-letter
>non-number characters like "." within the starting string might be
>interpreted as wildcards or other embedded options/commands
>(meta-characters). How can I make the search interpret its pattern as a
>plain-text string that might contain *all* 256 characters of my character
>set? And how can I construe and apply a mapping function that extracts all
>single characters following the starting string in "wholetext"? And, for
>future expansion: assuming a unicode input: how to deal with, for example,
>the large variety of Chinese characters? In which ways would the script
>probably have to be adapted?
>



# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org