A while back I saw some output from a web log analyzer that had a nifty, tree-based way of summarizing traffic. Kind of like this: 192.168.0.1 (14%) 192.168.0.2 (14%) com (57%) | bar.com (14%) | foo.com (42%) | | mummble.foo.com (14%) org (14%) | baz.org (14%) | | test.baz.org (14%) I had need of such a visualization the other day, and so the following: #!/usr/bin/perl -w # code fragments for a tree visualization of domain occurrences (e.g., # as part of a web log analyzer). # # Dave Smith <dws@postcognitive.com> use strict; my $root = [0, # tally for this node {}]; # child nodes, keyed by child name sub note_domain { my $domain = shift; my @parts = $domain =~ /^\d+\.\d+\.\d+\.\d+$/ ? ( $domain ) : reverse split(/\./, $domain); $$root[0]++; # increment the overall tally my $node = $root; foreach my $part ( @parts ) { if ( not exists $$node[1]{$part} ) { $$node[1]{$part} = [0, {}]; } $node = $$node[1]{$part}; $$node[0]++; } } sub summarize_domains { _summarize_domains(0, $root, ""); } sub _summarize_domains { my($level, $node, $basename) = @_; foreach my $part ( sort keys %{$$node[1]} ) { my $subnode = $$node[1]{$part}; my $percent = int(100 * ($$subnode[0] / $$root[0])); next if $percent < 1; my $newname = $part; $newname .= ".$basename" if $basename ne ""; print "| " x $level, $newname, " (", $percent, "%)\n"; _summarize_domains($level + 1, $subnode, $newname); } } note_domain("foo.com"); note_domain("foo.com"); note_domain("mummble.foo.com"); note_domain("bar.com"); note_domain("test.baz.org"); note_domain("192.168.0.1"); note_domain("192.168.0.2"); summarize_domains(); ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe