>On Mon, 10 Feb 1997, Chris Hammond-Thrasher wrote: > >> Vicki's request reminds me of something that I really need in the next week >> or so, a Perl/MacPerl script that goes beyond a link checker to acting as a >> simple web robot. What I need is an app that follows all the links on a >> single web page and locally saves all of the linked documents. Any help >> would be appreciated. > >Pick up the "Web Tondeuse" program from info-mac. It's a Java program that >can be run with the Mac Java runtime. It works great for this kind of thing. > > --- Joe M. I'm just writing a simple script that gets a file with a list of urls, and then reads all of them to the hard disk. It also creates all necessary directorys and gets all local files and imgs of the requested document. Unfortunately, I'm still working on the OS independent directory creation which may take a couple of days. I've included an older version which may help you though it's not exactly what you need. EMail if you're interested in the new version (it's free, of course). Greetings, Erich Rast. --------------------------------------- h0444zkf@rz.hu-berlin.de http://www2.rz.hu-berlin.de/~h0444zkf/ --------------------------------------- #! /usr/bin/perl use LWP::UserAgent; use File::Basename; use URI::URL; =head1 NAME B<Blowjob> - a simple http sucker =head1 SYNOPSIS I<Usage:> blowjob [<filename>] <filename> is a file that contains a list of URLs separated by newline. Default <filename> is 'blowjob.job'. =head1 DESCRIPTION Reads a number of http-documents from www-servers into a local default folder. The URLs to read from are specified in the input file on the command line. The File should contain a list of urls separated by newline, lines beginning with # will be ignored. Requires the libwww5.03 library. =cut ### startup $version = "0.23"; print "Blowjob/$version (c) 1996 by E. Rast\n\n"; ### filename and path & misc $file = 'blowjob.job'; # default input file if ( $#ARGV > 0 ) { die "Too many arguments.\n"; } if ( @ARGV ) { $file = $ARGV[0]; } # get name without .suff ($infile, $inpath, $suffix) = fileparse( $file, ''); ### create user agent $ua = new LWP::UserAgent; $ua->agent("Blowjob/$version"); ### main loop open( IN, $infile ) || die "Cannot open jobfile '$infile': $!\n"; binmode IN; JOB: while ( $job = <IN> ) { next JOB if ( $job =~ /^#/o ); chop ( $job ); $url = new URI::URL $job; if ( $url->host ) { ++( $count ); $filename = "job-$count.html"; $res = &get_url( $url, $filename ); if ( $res->is_success ) { print "Job-$count read: <$job>\n"; } else { print "Couldn't get <$job> as '$filename'\n"; } } else { print "Not a valid URL: '$job'\n"; } } close IN; &say( "Done.\n" ); ### get a url sub get_url { my( $url, $file ) = @_; my $req = new HTTP::Request GET => $url; my $res = $ua->request( $req, $file ); return $res; }