[First of all, an administrative topic on a recurring topic: This is a *Mac*Perl list, stick to *Mac*Perl questions and answers! ] As usual, people here deserve credit for their helpfulness but were, once more, all too eager to jump in with non-MacPerl specific solutions to a non-MacPerl specific problem. A big, extra BOO! to Chris Nandor for suggesting to move the data processing to an Unix server (Stone the infidel :-), although it must be said that he also suggested File::Sort, which would solve the problem on a Mac. Here's another solution which works on a Mac, using the much underappreciated DB_BTREE (which keeps data implicitly sorted). I consider it MacPerl specific in that it is still practicable with 300M files. Strider <Strider@baka.com> writes: >This is frustrating. I've done everything here, and I can't get this script >to work. It takes summarized log entries (in the form of >"Username\tDate\tlogins\tbad disconnects\tserver\n") and SHOULD sort them >by date, then username, and if both are the same, combine the records. >S 9/5/97 554 1 0 c3 >t 9/2/97 14403 1 0 w2 >t 9/3/97 14404 1 0 w2 >t 9/5/97 33059 3 0 w2 ------------------------------- #!perl use DB_File; tie %DB, 'DB_File', "db", O_RDWR|O_CREAT|O_TRUNC, 0666, $DB_BTREE or die $!; while(<>) { ($user,$date, $l1, $l2, $l3, $host) = split; ($month,$day,$year) = split "/", $date; # Adjust 2 digit years for Y2K $year += ($year>100) ? 0 : ($year>90) ? 1900 : 2000; # Reorder date in YMD form for later comparison $key = sprintf("%04d%02d%02d %s", $year, $month, $day, $user); if ($data = $DB{$key}) { # Existing record, update ($cl1, $cl2, $cl3, $hosts) = split(" ", $data, 4); $l1 += $cl1; $l2 += $cl2; $l3 += $cl3; $host = ($hosts =~ /\b$host\b/) ? $hosts : "$hosts $host"; } $DB{$key} = "$l1 $l2 $l3 $host"; } while (($key, $data) = each %DB) { # Unsophisticated format, but you get the point print "$key $data\n"; } untie %DB; ---------------------------------------- While File::Sort probably beats this solution for processing entire files, the above solution can be transformed into an incremental solution (where you just add the latest batch of data every week) simply by leaving out the O_TRUNC. Please note also the attempt to enforce some year 2000 sanity in the face of the deficient input format, while at the same time allowing for a switch to a 4 digit input year. Note also the \b in the host match, as otherwise "winter" would match "wintermute". Matthias ----- Matthias Neeracher <neeri@iis.ee.ethz.ch> http://www.iis.ee.ethz.ch/~neeri "Larry Wall got us 15kg of cyclonite. We're waiting for the final go-ahead then we'll blow them all to hell." -- (Presumably random) .sig generated by the Berkeley anonymous remailer. ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch