On Tue, Mar 20, 2001 at 12:15:05AM +0100, abigail@foad.org wrote: > On Mon, Mar 19, 2001 at 04:44:51PM -0500, yanick1@sympatico.ca wrote: > > sub sample { > > my ($count, $sample) = ( $weights[0] ); > > for (my $i = 1; $i < @weights; $i ++) { > > $count += $weights [$i]; > > $sample = $i if rand $count < $weights [$i]; > > } > > $sample; > > } > > > > would save one useless call of rand. > > Yeah, but if all weights are 0, you return the first element of the set, > and I return undef. True. > Plus, why special case 0, if there's no need to? Because a random number is a terrible thing to waste? :) > > $sample_sum += $sample{ $_ } foreach keys %sample; > > I guess you want values here, not keys. *blush* > > @sample_index = sort { $sample{$b} <=> $sample{$a} } keys %sample; > > > > sub draw > > { > > my $count = rand $sample_sum; > > for( @sample_index ) > > { > > $count -= $sample{ $_ }; > > return $_ if $count <= 0; > > } > > } > > > > be much faster in average? > > Not by a long shot! You are garanteed to take Omega (n log n) time, > due to your sort, while a single loop takes O (n). Aye. But the sort only has to be performed once (assuming, of course, that the population does not change from one call of sample() to the other). If you draw enough samples, this first overhead will be overcome by the the samller constant K it provides for the loop O(n)itude. > Not that the > sort is necessary in any way. Not necesary, but helpful if the population has a few observations with very large weights, and many, many ones with small weights. The sort only makes sure we try our luck with the elements that has the higher probability to be choosen, thus exiting the loop faster (your loop goes through all the population element each time, mine does it only in the worse case). > Then there's the O (n) additional memory (keys %sample). Guilty as charged on that one. This being said, if I run some test, I get: Benchmark: running abi, yanick_sorted, yanick_unsorted, each for at least 10 CPU seconds... abi: 28.67/s (n=318) yanick_sorted: 5387.94/s (n=58513) yanick_unsorted: 70.09/s (n=710) (code included below) Granted, I have taken an ugly sample, and excluded the creation of %sample from the test. The 'n' calls to rand for each sample seems to be quite costly, time-wise (not to mention that it's not as handy as to use a single random number when comes the time to play with antithetic values and correlation analysis). Joy, Yanick -- use Benchmark; @weights = ( (1) x 999, 99999 ); %sample = map { $_ => $weights[$_] } 0..$#weights; $sample_sum += $_ foreach values %sample; @sample_index = sort { $sample{$b} <=> $sample{$a} } keys %sample; $nbr_sample = 5000; timethese( -10, { 'abi' => \&sample , 'yanick_sorted' => \&draw }, 'yanick_unsorted' => \&draw_unsorted }); sub sample { my ($count, $sample); for (my $i = 0; $i < @weights; $i ++) { $count += $weights [$i]; $sample = $i if rand $count < $weights [$i]; } $sample; } sub draw { my $count = rand $sample_sum; ( $count -= $sample{ $_ } ) <= 0 and return $_ for @sample_index; } sub draw_unsorted { my $count = rand $sample_sum; ( $count -= $weights[ $_ ] ) <= 0 and return $_ for 0..$#weights } ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe