[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Clarification of detecting "broken links"/invalid URLs



Last week I posted a question on how to validate URLs using
MacPerl. Reading the responses, I see that I wasnt clear about
the original problem.

What I meant to ask was how Macperl can check if a specified
URL (pointing to a document) is indeed a valid document.
The problem arises in the area of web page maintenance. I am
writing a utility (in MacPerl!) to cross-reference a web site and want
to check if each URL is still "valid"/referring to a valid document
on a Web site.

The web site in question is my "Creativity Web" which contains
many links to other documents on the web. What usually happens
is that someone asks me to put a link to their web site,
eg http://www.isp.com/~freddo/mypage.htm
Several months later, the user may not exist on that web
server, and someone else will report to me that the above
link is returning a status 401 - document not found on server.

I'm not sure if it is possible to detect web documents that have
a short page saying "THis site has moved to the following address...."
as these are valid documents.

I received some useful code from several list members and it seems
that the "Connect" call is needed to validate a domain name. What my
program needs to do is to pretend to be a web browser and fetch
a specified web document.

I hope this makes my "problem" clearer!   Coincidentally, I found
a program called Big Brother which does one task: check external
links in an HTML document!  I would prefer to use MacPerl and then
I would learn something.

Charles


Creativity Web: http://www.ozemail.com.au/~caveman/Creative


------------------------------------------------------
Charles Cave                         Sydney, Australia
Email:                         charles@jolt.mpx.com.au
------------------------------------------------------