[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] Weighted sampling



On Mon, Mar 19, 2001 at 04:44:51PM -0500, yanick1@sympatico.ca wrote:
> On Mon, Mar 19, 2001 at 10:14:35PM +0100, abigail@foad.org wrote:
> > That wouldn't work if @weights contains non-integers, or when the
> > sum of the weights is huge.
> > 
> >     sub sample {
> >         my ($count, $sample);
> >         for (my $i  = 0; $i < @weights; $i ++) {
> >             $count += $weights [$i];
> >             $sample = $i if rand $count < $weights [$i];
> >         }
> >         $sample;
> >     }
> > 
> > This uses O (1) additional memory, and only requires @weights to contain
> > non-negative numbers.
> 
> And 
> 
>      sub sample {
>          my ($count, $sample) = ( $weights[0] );
>          for (my $i  = 1; $i < @weights; $i ++) {
>              $count += $weights [$i];
>              $sample = $i if rand $count < $weights [$i];
>          }
>          $sample;
>      }
> 
> would save one useless call of rand.

Yeah, but if all weights are 0, you return the first element of the set,
and I return undef. Plus, why special case 0, if there's no need to?

> But wouldn't 
> 
> %sample       = ( a => 1, b => 0.5, c => 613 );
> $sample_sum  += $sample{ $_ } foreach keys %sample;

I guess you want values here, not keys.

> @sample_index = sort { $sample{$b} <=> $sample{$a} } keys %sample;
> 
> sub draw
> {
> 	my $count = rand $sample_sum;
> 	for( @sample_index )
> 	{
> 			$count -= $sample{ $_ };
> 			return $_ if $count <= 0;
> 	}
> }
> 
> be much faster in average?


Not by a long shot! You are garanteed to take Omega (n log n) time,
due to your sort, while a single loop takes O (n). Not that the
sort is necessary in any way.

Then there's the O (n) additional memory (keys %sample).



Abigail

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe