Mark Rogaski <wendigo@pobox.com> writes: > An entity claiming to be Michael G Schwern (schwern@pobox.com) wrote: > : > : > > CCAA CCAA AAGT CAGT TCCT CGCT ATGT AACA CACA TCTT GGCT TTGT AACA GTGT > : > > : > these would better be arranged in groups of three, as three > : > bases make a codon, which codes for one amino acid. > : > : True, but that means I could only encode 6 bits of information per > : group. God obviously never had to work with high ASCII. > : > > Is it necessary to allign on any particular boundaries? If you treat the > source as a bit string and break it into 6 bit units, codons work fine > (unless there is some distribution requirement). The 6-bit requirement is generally considered the alignment requirement (bioinformaticians routinely search for "open reading frames" -- large segments with no stop codons). But actually it's a lot worse than that. In eukaryotes like you, genes are composed of exons and introns. The introns drop out of the RNA, leaving only the concatenation of exons to be translated to protein. So the alignment requirement applies only to "code", not to the "comments" that get stripped during preprocessing. Only we don't really understand introns. Oh, and there can be more than one "parse" of a gene into exons and introns. The relevance to Perl? For a start, check out what Lincoln Stein *really* does, when he's not coding CGI.pm... -- Ariel Scolnicov |"GCAAGAATTGAACTGTAG" | ariels@compugen.co.il Compugen Ltd. |Tel: +972-2-5713025 (Jerusalem) \ We recycle all our Hz 72 Pinhas Rosen St. |Tel: +972-3-7658117 (Main office)`--------------------- Tel-Aviv 69512, ISRAEL |Fax: +972-3-7658555 http://3w.compugen.co.il/~ariels ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe