[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

RE: [FWP] More Simplification




> -----Original Message-----
> From: Tushar Samant [mailto:tushar@i-works.com]
> Sent: Saturday, June 19, 1999 14:46
> To: fwp@technofile.org
> Subject: Re: [FWP] More Simplification
> 
> 
> > Problem: Convert all characters not in the set [AaTtCcGg] to [Nn],
> > preserving case:
...
> > Anyone have a nicer solution?
> 
> How about uglier?
> 
>            s/([^ATCG])/'N'^' '&$1/egi;
> 
> And non-portable...

It's as portable as the ASCII character set, which means damned near
ubiquitous these days.

But, as I said before, it and its friends are S-L-O-O-O-W!

The following benchmark measures the correct suggestions so far.  It was
run on perl 5.005_03.


#!/usr/local/bin/perl -w
use strict;
use Benchmark;

my $seq = join "" => '0' .. '9', 'a' .. 'z', 'A' .. 'Z';

timethese(1 << (shift || 0), {
  Ord   => sub { (my $x = $seq) =~ s/([^ACTGactg])/
                   chr(ord('N') - ord(uc($1)) + ord($1))/eg },
  Perl4 => sub { (my $x = $seq) =~ s/([^ACTGactg])/
                   pack('C', ord('N') - ord("\U$1") + ord($1))/eg },
  Tr    => sub { (my $x = $seq) =~ tr/a-zATCG/N/c; $x =~ tr/A-Zatcg/n/c
},
  Xor   => sub { (my $x = $seq) =~ s/([^ACTGactg])/'N' ^ $1 ^ "\U$1"/eg
},
});
__END__

Benchmark: timing 16384 iterations of Ord, Perl4, Tr, Xor...
       Ord: 16 wallclock secs (15.98 usr +  0.00 sys = 15.98 CPU)
     Perl4: 19 wallclock secs (18.45 usr +  0.00 sys = 18.45 CPU)
        Tr:  0 wallclock secs ( 0.39 usr +  0.00 sys =  0.39 CPU)
            (warning: too few iterations for a reliable count)
       Xor: 17 wallclock secs (16.47 usr +  0.00 sys = 16.47 CPU)

'Nuff said?

-- 
Larry Rosler
Hewlett-Packard Company
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com 

==== Want to unsubscribe from this list? (Don't you love us anymore?)
==== Well, if you insist... Send mail with body "unsubscribe" to
==== fwp-request@technofile.org