[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Porting a Perl Program




Hi there,

I have written a very small program in PERL to read in HTML files from a
remote source and then count the 5 most frequent words.  This works fine in
UNIX, but I am a bit confused about how to get the same functionality on
the Mac.  The part of the program which is giving me hassle is the '|sort'
on the (open SESAME) line which obviously is not possible on a mac (no
command line ;-).  The program is included below.

I am then hoping to write these values into a Filemaker DB on a Mac
webserver running WebStar, Tango and Filemaker 3.0   Does any1 know how to
do this??

The point of the program is to try and determine what people are interested
in by tracking what they read on a website.

If there is anyone out there who could suggest a workaround for the pipe to
sort argument and the filemaker thing then I would *REALLY* appreciate a
reply!

Thanks very much,

Shyam.



#!/usr/bin/perl -w
#
# STRIP Revision 2.x
# Shyam Hegde
#

    use strict;
    use LWP::Simple;
    use HTML::Parse;
    use HTML::FormatText;
    my ($html, $ascii, $url);


                                $url = shift || "http://127.1/";
    $url = "file:/$url"  if $url =~ m#^/#;
    $url = "http://$url" if $url =~ m#^www#;
    $url = "ftp://$url"  if $url =~ m#^ftp#;



    $html = get($url);
    defined $html or die "Can't fetch HTML from $url";
    $ascii = HTML::FormatText->new->format(parse_html($html));

    my %seen;
    my $line;
    my $temp;
    my @allwords;
    my $i=0;
    my $point=0;
    my $flag=0;
    my $lower;
    my $tempvar;
    my @temparray;

    open (EXCLUDE, "exclusions.inc") || die "Can't open file: $!\n";
    @allwords = <EXCLUDE>;
    close EXCLUDE;


    while ( $ascii =~ /\b(?=[a-z])(\w+)\b/gi )
    {
        $flag = check($1);
        if ($flag != 1)
        {
                $lower = lc $1;
                $seen{$lower}++;
        }
    }

    sub check {
        my $point=0;

        for $i ( 0 .. $#allwords)
        {
                $temp = $allwords[$i];
                chop $temp;
                if ( (lc $temp) eq (lc $1) )
                {
                        $point = 1;
                }

        }
        return $point;
    }


    open(SESAME, "|sort +1rn >keywords.out");
    while ( my($word, $count) = each %seen )
    {
        print SESAME "$word\t\t\t\t$count\n";
    }
    close(SESAME);


    open(TOPTEN, "keywords.out") || die "Can't open file: $!\n";
    @temparray = <TOPTEN>;
    close(TOPTEN);

    open(TOPTEN, ">keywords.out") || die "Can't open file: $!\n";
    if ( $#temparray >= 10 )
    {
        for $i ( 0 .. 9 )
        {
                $tempvar = $temparray[$i];
                print TOPTEN $tempvar;
        }
    }
    elsif ( $#temparray >= 1 )
    {
        for $i ( 0 .. $#temparray )
        {
                $tempvar = $temparray[$i];
                print TOPTEN $tempvar;
        }
    }
    else
    {
        print TOPTEN "NO_KEYWORDS";
    }
    close(TOPTEN);

__________________________________________________
Developers                                   @ C=+

dev@cequel.co.uk         #include 'cheesyquote.h';
__________________________________________________



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch