[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] HTML link checker?



At 2:47 am +0100 10.02.97, Vicki Brown wrote:
>I'm hoping someone on this list might know of the availability of an HTML
>link checker written for Perl/Mac-Perl.

There are probably dozens. However, never one to be afraid of reinventing
the wheel, I wrote one which I've just uploaded to:

   http://www.tardis.ed.ac.uk/~angus/library/Perl/Indexers/LinkCheck.sit.hqx

This should be considered a 'work in progress'; I just re-hacked an older
version because I needed to check my own site. It hasn't been tested very
thoroughly, and doesn't have all the features I'd like it to have, and thus
probably isn't really ready for prime-time. Such documentation as there is
- at the start of the main file - is probably misleading, because it refers
to the older version. Still, you should be able to figure out how to work
it by reading the comments ("Use the Force - read the source.").

The distribution is in BinHex'd StuffIt format, and contains eight files.

   RunLinkCheck              A droplet I use for launching the whole thing.
   LinkCheck.cfg             A configuration file
   LinkCheck.pl              The main script
   ConfigurationFile.pl      Handle configuration files
   ExclusionFile.pl          Handle exclusion files
   PathUtilities.pl          Path-munging utilities
   ProcessDirectoryTree.pl   Directory-processing shell
   URLUtilities.pl           URL-munging utilities

The last five files are all part of my homebrew 'library', and you should
make sure that they're somewhere in your usual include path, because
'LinkCheck.pl' will 'require' them. I don't actually include a sample
exclusion file (used to specify directories you don't want to process) but
the syntax is documented in 'ExclusionFile.pl', and should be trivial to
work out. Theoretically, you can customize the script's behaviour either by
setting up a configuration file or by passing command-line arguments (and
also theoretically, it should be good for UNIX as well as Macintosh).
However, I haven't tested this part since I re-hacked the script.

The present version will check image links and anchors in HTML and (I
think) NCSA format map files. In future, I aim to add code to get it to
recognise and process CERN map files, and to deal with EMBEDs and APPLETs
(which are going to be more awkward, because I'll have to take into account
the CODEBASE attribute).

If you download this and think you might want to keep using it, please mail
me to let me know, and I'll keep a note of your address so that I can send
you offers for holiday home timeshares, pyramid marketing schemes, and
information on how *you* can save up to 50% on your phone bill. Ahem, no,
of course I won't.  What I'll do is to mail you as new versions become
available, so that eventually you'll be able to grab one that might
actually work.

Requests for support, documentation, extra features etc., will be treated
with callous indifference and cruel derision.

                                           A

--
                 angus@pobox.com         http://pobox.com/~angus

  "I'm stubborn as those garbage bags that time will not decay.
   I'm junk but I'm still holding up my little wild bouquet."
  ["Democracy", Leonard Cohen]