[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] sub-search match novice question



At 8:45 AM +0000 9/5/99, Bart Lateur wrote:
>On Sat, 4 Sep 1999 22:51:28 -0400, Randy M. Zeitman wrote:
>
>>I'm matching a string to text file of records (rows). Each entry (column)
>>in each record is tab separated (a la spreadsheet).
>>
>>But sometimes I want to only match against columns 2,3,and 4, of all
>>records and other times I want to match only against column 7. Any quicker
>>way to do this than to parse each record into an array toss the unwanted
>>entries and re-string the thing? (a guess...)
>
>You could.
>
>>Don't actually need the answer, but a kick in the right direction...how
>>might I do this with one elegant match?
>
>
>	local $" = "\t"; # why not?
>	while(<FILE>) {
>		chomp;
>		my @field = split /\t/;
>		if("@field[2,4,7]" =~ /searchterm/) {
>			print "Got a match in $_\n";
>			# prints record
>		}
>	}
[snip]

That's zippy, but I'd prefer to use Perl's grep, although it means 
slurping in the whole file, which can cost performance-wise if the 
file is large.

The logic of the above statement

>		if("@field[2,4,7]" =~ /searchterm/) {

seems not very useful if you know what your data structure is. For 
example, lets say your file has records like this (header line first, 
but not needed in the actual file):

FIRSTNAME\tLASTNAME\tPHONE\tSTREET\tCITY\tSTATE\tZIP
Fred\tJones\t213-555-1111\t123 Elm St\tLos Angeles\tCA\t90024
Martha\tWashington\t408-555-2222\t123 Front Street\tLos Altos\tCA\t95021
Olive\tOyle\t831-555-9999\t35 Wharf Road\tSanta Cruz\tCA\t95060
Mudhen\tRainbow\t831-555-7777\tTown Clock\tSanta Clara\tCA\t94567

If I didn't mis-type, we have a header and 4 records with seven 
fields (columns).

Open the file and slurp all the lines into an array:

open (DATA, $datafile) || die("Can't open data file $datafile\n $! \n");
@all_records = (<DATA>);
close DATA;

Now, if you were searching your database for records with a certain 
zip, they'd only appear in the 7th column. You'd get bogus returns if 
your method allows the seachterm to match data in any other column.

Besides that, maybe you want to find records that match search 
criteria in more than one column, say all the people with last name 
'Jones' and zip code '99324'. I haven't tried it, but maybe the above 
example would allow /searchterm/ to be in a form that would 
correspond to the multi-field array slice @field[2,4,7] in the if 
statement, but that seems like a bother -- for example, a 
comma-separated list wouldn't be good enough because there might be 
commas _in_ the data.

So instead, build a regular expression to use with grep. To find all 
the records with last name 'Jones' in the above sample database, try:

$grep_pat = "/^.*\tJones\t.*\t.*\t.*\t.*\t.*/";
@found_records = grep(/$grep_pat/i, @all_records);

Now you have a list of records, @found_records, in the same form they 
exist in your database, to process as you want.

The '^' at the beginning makes sure your fields are registered with 
the start of the record (line). The 'i' allows case-insensitive 
matches.

To find all the records with last name 'Jones' and zip '94567', it would be:

$grep_pat = "/^.*\tJones\t.*\t.*\t.*\t.*\t94567/";

Using this approach also allows you to find records based on your 
choice of partial/exact/starts-with/ends-with criteria for the data 
in different fields. For an exact match (sticking with 'Jones' in the 
last name field), that portion of the regex would be

\tJones\t       # Matches on: last name eq 'Jones'

\tJones.*\t	# Matches on: last name starts with 'Jones'

\t.*Jones.*\t	# Matches on: last name has 'Jones' anywhere in it

\t.*Jones\t	# Matches on: last name ends with 'Jones'

\tJo.*\t	# Matches on: last name starts with 'Jo'

Also, if you have to deal with variations like 'Bob'/'Robert', try:

\t(Bob|Robert)\t 	# Exact match to 'Bob' OR 'Robert'

Remember that these are *portions* of the regex, just showing 
varieties of matching in a particular field/column. Also, IMPORTANT 
NOTE: learn about 'greedy' matching, and avoid using .* without some 
care, or you'll occasionally get unpredictable results.

You can build the grep pattern dynamically or hard-code it; this all 
depends on what your search needs are. You can do successive greps if 
you want more alternatives: the first grep might match all the 
records with last name 'Jones' and the second might match all records 
with state 'New Jersey'. Put 'em together in one list for a group of 
records for which last name is 'Jones' OR state is 'New Jersey'.

Hope this helps. One caveat: Bart and others who responded to his 
approach might know way more than I do. I've been programming 
professionally in Perl since 1995, but my knowledge tends to expand 
in response to the projects I do (or with thanks to folks on this 
list who correct me), so I don't claim to know the whole territory by 
any means...

Good luck!






- Bruce

__Bruce_Van_Allen___bva@cruzio.com__831_429_1688_V__
__PO_Box_839__Santa_Cruz_CA__95061__831_426_2474_W__


===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org