[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[FWP] A Million Monkeys



Hi All,

Here's a bit of fun I've been having with Perl and the Project Gutenberg
archives. 

This programme was inspired by an example provided in a Turbo Basic version
4.0 (<shudder>) training guide which I have long since disposed of. It
surrounds the well known question: If you had a million immortal monkeys
with typewriters and left them hitting keys how long would it take them to
write the complete works of Shakespeare.  

The TB example was to give the monkeys a helping hand by making some keys
bigger than others. They computed a frequency array of 26 letters plus " "
and "." from a piece of source text. The keys sizes were then made
proportional to the frequency. The monkey was simulated using random numbers
to generate more "Shakespearean" random text. It was still gibberish but it
drastically improved the time the monkeys would take (zillions of years down
to mirco-zillions :-). The exercise of this section was to write a 28 x 28
matrix version so that not only were the key sizes different, but they
changed shape whenever the monkey pressed a key. For example after a "q" you
normally expect a "u" so the "u" key should become large and all the others
small. 

The N-dimensional version of the problem is very simple to code in Perl
using a hash as a sparce frequency array. Here's a bit of (formatted) random
Hamlet using a spare 7 dimensional array. "Come hither murder me I will. Nor
I my lord I have grey bear will board delay the grave. But two special
oblivion or sixteen lines his fathers if these you receiving out Polonius."

My advice is get some Shakespeare, or any other seed text you like and see
what this comes up. I think the "Come hither murder me I will" was quite fun
for Hamlet. What's even better is if you mix up the seeds. I mixed Wuthering
Heights with my diary when I was 16 and it came out truly weird.  Next time
you get asked to summarise a document, why not use this?

For etext sources check out:
http://www.promo.net/pg/_authors/i-_shakespeare_william_.html

Here's a very basic version of the code. I kept it brief rather than include
all the features I now use. 

use strict;
my @allChars=split // , '.  abcdefghijklmnopqrstuvwxyz';
my (%freq);

my $lookAhead=7;
ComputeFreq();
my $outSize=1000;

# Get a "random" seed text of length $lookahed-1 
my $seed=(keys %freq)[0]; chop($seed);

# print $outSize many GetNextChars
while ($outSize-- > 0 ) {
	print ($_=GetNextChar() );
	print "\n" if ($_ eq ".");
};

sub ComputeFreq{
	my $a=0;
	#Read source as one single line
	undef $/; $_=<STDIN>; $/="\n";
	tr/A-Z/a-z/;    #Convert to lower case
	s/\?|\!/\./g;   #Convert all sentence endings into fullstops
	s/[\'\"\`]//g;  #Remove all apostrophes and quotes
	s/[^a-z\.]/ /g; #Replace non @allChars with spaces
	s/ +/ /g;       #Replace spaces sequences with one space
	s/\.+/\./g;     #Replace fullstops sequences with one fs.
	s/\. /\./g;     #Replace fullstop space with one fs.
	# Store a frequency count of each $lookAhead substrings
	while ($a + $lookAhead < length($_)) {
		$freq{substr($_ ,$a, $lookAhead)}++;
		$a++;
	};
}

sub GetNextChar {
	my ($sumFreq,$stopAtFreq)=0;
	foreach (@allChars) {
		$sumFreq+=$freq{$seed.$_};
	};
	$stopAtFreq = rand $sumFreq;
	$sumFreq=0;
	foreach (@allChars) {
		$sumFreq+=$freq{$seed.$_};
		if ($stopAtFreq < $sumFreq)  {
			$seed=substr($seed,1).$_;
			return $_ ;
		};
	};
	# The code should never reach here unless we picked the
	# last characters of the source text. 
}


----------------------------------------------------------------------
Alistair McGlinchy,           alistair.mcglinchy@marks-and-spencer.com
Capacity Planning, IT Ops     ph +44 0 171-268-5012    ext 5012
Marks and Spencer             fx +44 0 171-268-5721 

$$2^{\aleph_0} <\aleph_\omega \Rightarrow  {\aleph_\omega}^{\aleph_0}<
\aleph_{\omega_4}$$
- Saharon Shelah, 1994



_____________________________________________________________________________________________________

Registered Office:
Marks and Spencer plc
Michael House, Baker Street,
London, WIA IDN
Registered No. 214436 in England and Wales.

Telephone 0171-935-4422
Facsimile 0171-487-2670

www.marks-and-spencer.com

This e-mail is Confidential.  If you received it by mistake, please let us know and then 
delete it from your system; you should not copy, disclose, or distribute its contents to 
anyone nor act in reliance on this e-mail, as this is prohibited and may be unlawful.

_____________________________________________________________________________________________________

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe