[Date Prev][Date Next][Thread Prev][Thread Next]
[Search]
[Date Index]
[Thread Index]
[MacPerl-WebCGI] Indexing a remote site
I don't know if anyone else is annoyed by this, but recently I've been rooting about in the PDF versions of "Inside Macintosh" (http://developer.apple.com/techpubs/mac/pdf/), on previous visits I downloaded some zipped bundles of docs. Apple seems to have changed it's way of doing things since then and now you view them on-line, not a convenient option if you have to pay for telephone time. So I wrote the script down below using LWP (which to my total surprise, and to the authors Gisle Aas & Martijn Koster 's credit, worked first time) to download the PDFs onto my hard disk. So far no complaints, _but_ , I tried (and failed) to grep a file list which I hoped to pass to the rest of the script but ended up having to compile the list of docs I wanted to get by hand. I couldn't seem to use READDIR() and LPW only fetches documents. It goes without saying that I don't have ftp or telnet access to the directory, but is there a way to generate a file list on the fly?
#!perl-w
#---------------------------------------------------------
#----------- declare includes
use strict;
use diagnostics-verbose;
use LWP::Simple;
#----------- declare variables
my($file,$fileDB,$key,$folder,$test,$path);
$fileDB="Path:to:pre-compiled:text:containing:file:names";
print "setting up.....\n";
open (IN, $fileDB);
while(<IN>) {
chomp;
#reset the path to save the docs to
$folder='MY HD:Desktop Folder:programming docs:';
#reset the path to get the docs from
$file="http://developer.apple.com/techpubs/mac/pdf/";
print "getting file $_\n";
#set the path to the destination file
$folder="$folder$_";
print "save path is $folder\n";
#set the path to the file
$file="$file$_";
print "get path is $file\n";
$test=getstore($file,$folder);
defined $test or die"something screwy with getting $_: $!";
print "$_ gotten\n\n";
#sort the creator code and file type
MacPerl::SetFileInfo('CARO', 'PDF ', $folder);
}
close(IN);
#----------- subroutine separator
#-----------