[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[FWP] Shuffling (was Re: [FWP] Perl Card Games)



(Warning: this might be a bit for "fun with math" than fun with perl, but I
suspect the fraction of Perl geeks who are also math geeks is large.)

Oops!

It turns out that the simple shuffle idea that a few people have mentioned
on the list isn't actually a random shuffle. In fact, the Cookbook mentions
the fisher_yates shuffle as the Right Way to do it. Allow me to quote the
salient paragraph (for those few of you who have been too consumed with
False Laziness to buy the Cookbook):

----------------------------------------------------------------------
Shuffling is a surprisingly tricky process. It's easy to write a bad
shuffle:

    sub naive_shuffle { # don't do this
	for (my $i = 0; $i < @_; $i++) {
	    my $j = int rand @_;
	    ($_[$i], $_[$j]) = ($_[$j], $_[$i]);
	}
    }

This algorithm is biased; the lists' possible permutations don't all have
the same probablity of being generated. The proof of this is simple: take
the case where we're passed a 3-element list. We generated three random
numbers, each of which can have three possible values, yielding 27 possible
outcomes here. There are only 6 permutations of the 3-element list, though.
Because 27 isn't evenly divisible by 6, some outcomes are more likely than
others. 

The Fisher-Yates shuffle avoids this bias by changing the range of
the random numbers it selects.
----------------------------------------------------------------------

(As a reminder, Fisher-Yates is just like the naive shuffle, except you
loop *down* instead of up, and take rand($i+1) instead of rand(@_).)

I thought about this a bit, and it's pretty cool. It turns out (if you
waste some time doing shuffling in your head) that in a three element list,
the first element has a 1/3 chance of ending up anywhere. However, the
second element & third have chances of 1/3, 8/27 (.37037), and 10/27
(.2963) of landing in various locations.

Meanwhile, the Fisher-Yates is algorithmically really cool. If we go back
to 52 cards, it works like this: the second card has a 1/52 chance of being
switched to position 52 in the first time through the loop. What's the
chance of it being switched to position 51? Well, the chance that it wasn't
already switched to 52 times the chance that 51 chooses to switch with it,
or 51/52 * 1/51 = 1/52. And so on!

This got me curious as to whether my Schwartzian method was truly random.
So I wrote a program to find out. It does 10^6 shuffles and counts how
often a list member (e.g. the second of the three members) shows up in each
position. I got:

   Shuffle Name        0        1        2  
--------------------------------------------------
    Naive (bad)   0.3712   0.2960   0.3329  
   Fisher-Yates   0.3335   0.3334   0.3331  
    Schwartzian   0.3332   0.3332   0.3335  

So it *looks* like the schwartzian method *is* random, and the naive method
is in fact non-random. When I tried using larger sets, btw, I found that
the standard deviation of the naive algorithm's probabilities away from 1/N
is about 2-3 times the FY or S algorithms.

-Amir

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe