[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] poets mix problems

To: miku <miku@onlinehome.de>
Subject: Re: [MacPerl] poets mix problems
From: Arved Sandstrom <Arved_37@chebucto.ns.ca>
Date: Sat, 26 Feb 2000 23:30:28 -0400
Cc: Mac Perl <macperl@macperl.org>
In-Reply-To: <B4DE0824.DFF%miku@onlinehome.de>

The 'perlre' and 'perlop' manpages are the places to start. Chapter 6 in the 
Perl Cookbook is also excellent.

For example Recipe 6.2 in the latter, "Matching Letters", is probably close 
to what you actually want. I don't wish to plagiarize, but in brief, you use 
the locale package (read 'perllocale'), and employ a RE like

	$string =~ /^[^\W\d_]+$/

\w matches an alphabetic, a digit, or underscore (_). \W is everything else. 
So to get alphabetics, we specify that we do NOT want the "everything else", 
or digits, or the underscore. The _first_ ^ and the $ anchor the match to 
the beginning and end of the string.

I'm not sure I understand why you would want a pattern that matches every 
character in your character set. :-)

In terms of extracting following strings, if you have 'wholetext' in 
variable $wholetext, and 'starting_string' in $starting_string, then the RE

	($matched) = ($wholetext =~ /$starting_string(.*)/);

will return everything after 'starting_string' in variable $matched. I have 
deliberately left the match greedy; you might wish to change that, or if you 
want to retrieve each following match for all occurrences of 
'starting_string', then you use the 'g' modifier.

Theoretically proper use of 'locale' and POSIX will account for Unicode, and 
your RE's should work. Mind you, some concepts like \b for a word boundary 
may not exist for a given script.

Arved Sandstrom

At 10:47 PM 2/26/00 +0200, miku wrote:

[ Brutal snippage ]

>More to the point, my problem is: if I do a pattern search, non-letter
>non-number characters like "." within the starting string might be
>interpreted as wildcards or other embedded options/commands
>(meta-characters). How can I make the search interpret its pattern as a
>plain-text string that might contain *all* 256 characters of my character
>set? And how can I construe and apply a mapping function that extracts all
>single characters following the starting string in "wholetext"? And, for
>future expansion: assuming a unicode input: how to deal with, for example,
>the large variety of Chinese characters? In which ways would the script
>probably have to be adapted?
>

# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org

Follow-Ups:
- Re: [MacPerl] poets mix problems
  - From: miku <miku@onlinehome.de>

References:
- [MacPerl] poets mix problems
  - From: miku <miku@onlinehome.de>

Prev by Date: Re: [MacPerl] CGI Threading & File Locking
Next by Date: [MacPerl] Re: [MacPerl-WebCGI] Document contained no data
Prev by thread: [MacPerl] poets mix problems
Next by thread: Re: [MacPerl] poets mix problems
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net