I created an algorithm which I believe is fast, and kind of mimicks the Boyer-Moore (or some other efficient string matching algorithm) mechanism of jumping forward. The program is beneath my signature file. Here is an explanation of its operation: Assume a string like "the aftermath". The loop will start looking at the first character, 't'. It then scans the rest of the string for the next occurrence of 't' -- if there is one, that is the upper bound for the unique-character substring (hereafter referred to as UCS). This upper bound is at character 6. The loop also caches this occurrence in the @ahead array ($ahead[0] = 6). Next, we move to the 'h', at offset 1. We set $ahead[1] to 12, but we do not adjust the bound of the substring, since 12 > 6. Next the 'e' (and $ahead[2] = 7), and the space ($ahead[3] is -1). Once we've made our way to the "stop character", 't', we have our substring, "the af". This is pushed to the list of potential matches. The "stop character" is the character that would have created the duplication. We then move to the character AFTER the occurrence of the stop character in the substring (so we start at the 'h' in 'the'). We can make this jump because any string containing the stop character will have to stop at this second occurrence of it. If, for example, we had "the new autumn", then the stop character for the first UCS is 'e'. This means that we now skip to the ' ' after the word 'the' for our next possible UCS. The other short circuit is done when we have too little content left in our string, we can quit searching. The final @ahead array for the string "the aftermath" is: char: t h e _ a f t e r m a t h pos: 0 1 2 3 4 5 6 7 8 9 10 11 12 ahead: 6 12 7 -1 10 -1 11 -1 -1 -1 -1 -1 -1 Keep in mind that the full @ahead array needn't be made (and in this case, it isn't -- the 't' and 'h' at the end of the string are never reached). I hope this can be optimized more, though. You can run the program through this email, via japhy% perl -x name_of_email -- Jeff "japhy" Pinyan japhy@pobox.com http://www.pobox.com/~japhy/ CPAN - #1 Perl Resource (my id: PINYAN) http://search.cpan.org/ PerlMonks - An Online Perl Community http://www.perlmonks.com/ The Perl Archive - Articles, Forums, etc. http://www.perlarchive.com/ #!/usr/bin/perl -w use strict; while (<>) { chomp; my @longest = pinyan_UCS($_); my $str = $_; print map substr($str, $_->[0], $_->[1] - $_->[0]) . "\n", @longest; print "\n"; } sub pinyan_UCS { my $str = shift; my $len = length $str; my @c = split //, $str; my ($diff,$biggest) = (0,0); my (@ahead,@matches); for (my $i = 0; $i < $len; ) { my $c = $c[$i]; my $match = [ $i, $len, $i+1 ]; if ($len - $i >= $biggest) { for (my $k = $i; $k < $match->[1]; $k++) { $ahead[$k] ||= index($str, $c[$k], $k+1); if ($ahead[$k] != -1 and $match->[1] > $ahead[$k]) { $match->[1] = $ahead[$k]; $match->[2] = $k+1; } } $diff = $match->[1] - $match->[0]; if ($diff > $biggest) { ($biggest,@matches) = ($diff,$match) } elsif ($diff == $biggest) { push @matches, $match; } } else { last } $i = $match->[2]; } return @matches; } ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe