[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] extracting URL's



gauden@synapse.net.mt (Gauden Galea) wrote

>Hi to all,
>
>I am trying to write a script that scans text files (mailing list messages
>that are saved as separate files by Apple Internet Mail Server) and extract
>any URL's in the text and make them "live", basically to be able to mirror
>Sandra Silcot's achievement with this list for selected medical mailing
>lists. I have gone through lib-www.pl in the hope that I might find a
>solution (and searched the Silcot archive for previous discussions on this
>matter). I can't seem to find a solution. Can anybody help please?
>
>Thanks in advance,
>
>Gauden Galea
>http://www.synapse.net.mt/

These may be of use to you.

In article <5589bi$ll4@Holly.aa.net>, louis@globalxs.nl (Louis Lubbers
B.Sc.) wrote:

[posted && cc'd]

+ I maintain a database of web sites and am now looking for a perl
+ script that will check the links in my database.

I don't know if this will do what you want, but checkbot (v1.41 is the
latest) may do. See <url:http://dutifp.twi.tudelft.nl:8000/checkbot/>.

Note: Checkbot has the following software requirements: 

    perl 5.002 
    LWP 5.02. (libwww-perl 5 module) 
    libnet-1.00 (required by LWP) 
    Mail::Send (optionally with option -M, available in Mailtools archive) 

James

-- 
#!/bin/perl -s-- -export-a-crypto-system-sig -RSA-3-lines-PERL
$m=unpack(H.$w,$m."\0"x$w),$_=`echo "16do$w 2+4Oi0$d*-^1[d2%Sa
2/d0<X+d*La1=z\U$n%0]SX$k"[$m*]\EszlXx++p|dc`,s/^.|\W//g,print
pack('H*',$_)while read(STDIN,$m,($w=2*$d-1+length$n&~1)/2)


Eddy De Clercq<Eddy.DeClercq@coi.be> wrote
(in article <32670ff7.0@news.bru.tfi.be>):
>Hi,
>
>
>Does anyone has an example how to build an HHTP user agent/spider or
>links to existing source?

See: http://www.antipope.demon.co.uk/charlie/webbook/robot/index.html

for some example spiders and notes about them.


-- Charlie

-- 
Charlie Stross charlie@antipope.demon.co.uk http://www.antipope.demon.co.uk/
If you don't shoot the fish in your barrel, your barrel will soon be
full of fish. -- Tim Mefford
"NSA terrorists eat shitake mushrooms in Scunthorpe" <-- censor-bot jammer

________________________________________________________________
          Bob Wilkinson, Perl Programmer, Pindar plc
Tel: +44 (0)1904 613040    Email: B.Wilkinson@pindar.co.uk
Fax: +44 (0)1904 613110    URL: http://www.connection.co.uk/bob
________________________________________________________________
  I don't speak for my employer - er, they made me say that..
________________________________________________________________