[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] random selection proof



Okay, it's math time!  Here is my proof of the solution

#!perl -n
$line = $. unless int rand $.;
END { print $line }

for selecting a random line from a file.



The solution works if each line of input has an equal chance of being
selected.  For N lines of input, each line should have a 1/N chance of
being selected.



Proof by induction.


1.  Solution works for N = 1.

When there is one line, this solution will always pick that line.
   $line = $. unless int rand $.;
   # 'int rand $.' is always 0 when $. is 1.


2.  If solution works for N-1, solution also works for N.

By assumption, after reading in N-1 lines, each line has a 1/(N-1) chance
of being selected.

When line N is read in, that line has a 1/N chance of being selected.
   $line = $. unless int rand $.;
   # 'int rand $.' will be 0 1/$. of the time
Thus, line N has a probability of being selected of 1/N.

If line N is not selected, one of lines 1 .. N-1 will remain selected.
The probability of N not being selected is 1 - (1/N) = (N-1)/N.
For any line 1 .. N-1, the probability of being the previously selected
line is 1/(N-1) [see above].  Together, the probability that a line was
previously selected and that the line remains selected is
1/(N-1) * (N-1)/N = 1/N.
Thus, lines 1 .. N-1 each have a probability of being selected of 1/N.

Therefore, with N lines each line has a probability of being selected of
1/N.


Conclusion.

By step 1, the solution works for N = 1, and by step 2 the solution works
for all N greater than 1.


QED.



[*] The proof assumes that 'int rand($.)' is perfectly random, which it's
not.  It should be close enough for most uses, though.  :)



Did you find that helpful?  Confusing?  Comments appreciated!

Ronald

===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org