[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Parsing Script



[First of all, an administrative topic on a recurring topic:

This is a *Mac*Perl list, stick to *Mac*Perl questions and answers!
]

As usual, people here deserve credit for their helpfulness but were,
once more, all too eager to jump in with non-MacPerl specific solutions
to a non-MacPerl specific problem. A big, extra BOO! to Chris Nandor
for suggesting to move the data processing to an Unix server (Stone
the infidel :-), although it must be said that he also suggested
File::Sort, which would solve the problem on a Mac.

Here's another solution which works on a Mac, using the much 
underappreciated DB_BTREE (which keeps data implicitly sorted).
I consider it MacPerl specific in that it is still practicable with
300M files.

Strider <Strider@baka.com> writes:
>This is frustrating. I've done everything here, and I can't get this script
>to work. It takes summarized log entries (in the form of
>"Username\tDate\tlogins\tbad disconnects\tserver\n") and SHOULD sort them
>by date, then username, and if both are the same, combine the records. 

>S	9/5/97	554	1	0	c3
>t	9/2/97	14403	1	0	w2
>t	9/3/97	14404	1	0	w2
>t	9/5/97	33059	3	0	w2

-------------------------------
#!perl

use DB_File;

tie %DB,  'DB_File', "db", O_RDWR|O_CREAT|O_TRUNC, 0666, $DB_BTREE
   or die $!;

while(<>) {
   ($user,$date, $l1, $l2, $l3, $host) = split;
   ($month,$day,$year) = split "/", $date;
   # Adjust 2 digit years for Y2K
   $year += ($year>100) ? 0 : ($year>90) ? 1900 : 2000;
   # Reorder date in YMD form for later comparison
   $key = sprintf("%04d%02d%02d %s", $year, $month, $day, $user);
   if ($data = $DB{$key}) { # Existing record, update
      ($cl1, $cl2, $cl3, $hosts) = split(" ", $data, 4);
      $l1 += $cl1;
      $l2 += $cl2;
      $l3 += $cl3;
      $host = ($hosts =~ /\b$host\b/) ? $hosts : "$hosts $host";
   }

   $DB{$key} =  "$l1 $l2 $l3 $host";
}

while (($key, $data) = each %DB) {
   # Unsophisticated format, but you get the point
   print "$key $data\n";
}

untie %DB;
----------------------------------------

While File::Sort probably beats this solution for processing entire files, 
the above solution can be transformed into an incremental solution (where 
you just add the latest batch of data every week) simply by leaving out 
the O_TRUNC.

Please note also the attempt to enforce some year 2000 sanity in the face 
of the deficient input format, while at the same time allowing for a switch 
to a 4 digit input year.

Note also the \b in the host match, as otherwise "winter" would match 
"wintermute".

Matthias

-----
Matthias Neeracher   <neeri@iis.ee.ethz.ch>   http://www.iis.ee.ethz.ch/~neeri
  "Larry Wall got us 15kg of cyclonite.  We're waiting for the final go-ahead 
   then we'll blow them all to hell."
     -- (Presumably random) .sig generated by the Berkeley anonymous remailer.

***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch