[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] duh: A small puzzler



At 11:18 -0700 07/31/1999, Brian "L." Matthews wrote:
>|>     ($id) = split(/\s+/, $record);
>
>However, if a single space is the column separator and columns can be
>empty, then the + isn't just superfluous, it's wrong. Vicki didn't give
>us enough information to say which is correct. Of course, she wasn't asking
>about the split either...

true... the split was only to provide youall with the original record and
the first field, more like the actual code. People get unhappy if they
don't know where the variables get set :)

For those who care, the columns don't matter in this case. For parsing
these records, the first field (from the >) to the first white space is the
identifier; everything after that up to the first newline is commentary aka
description. Everything after that first newline is data, up to the next >.
Rinse, repeat

The format is known as FASTA format, it's used for DNA sequence data, and
the actual form of a record is

>identifier descriptive information with possible whitespace\n
sequence data\n
sequence data\n
...
>identifier descriptive information with possible whitespace\n
sequence data\n
sequence data\n
...

It's a little weird to parse because the records contain newlines, but it's
very regular.  You'll probably see it occasionally from me; it's been in
several other puzzles I've posted :-)

- V.
-- --
       |\      _,,,---,,_       Vicki Brown <vlb@cfcl.com>
 ZZZzz /,`.-'`'    -.  ;-;;,_   Journeyman Sourceror: Scripts & Philtres
      |,4-  ) )-,_. ,\ (  `'-'  P.O. Box 1269  San Bruno  CA  94066
     '---''(_/--'  `-'\_) http://www.cfcl.com/~vlb  http://www.macperl.com

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe