>>>>> "Bill" == Bill Jones <bill@fccj.org> writes: Bill> (ftp://|http://)[^ ]+ Bill> Now for the question: Your thoughts? For my IRC bot, I scan for things that resemble URLs, without being too strict. Since I'm not dealing with a large number of URLs at once, for each URL I think I find, I shoot off a HEAD request to the URL candidate, and only if I get a valid response from the server do I list it. Here is the main URL sniffing code: my $validchar = '[^\s<>"#{}|\\^\[\]\`@,]'; for ( split ( /\s+/, $msg )) { next if ( ! ( tr/\./\./ )); # What's a URL without dots? if ( m{ ( (?:https?|ftp)://$validchar+ | (?:ftp|www)$validchar*\.$validchar+ | $validchar+[^.]\.(?:com|net|edu|org|us|uk|ca|de|se|au|jp|no|fr|nl|dk|tw)/?$validchar* ) }xoi ) { my $URL = $1; found_url ( $nick, $host, $chan, $URL ); } } Things of note: Finding http:// is a sure sign someone is giving a URL. Otherwise ftp or www smells like a URL. Looking for common TLDs can find URLs like sneaky.ca, obviously my list isn't complete. found_url does a little more cleaning with the URL: sub found_url ( $$$$ ) { my ( $nick, $host, $chan, $url ) = @_; $url =~ s/^[.,()<>{}?!]+//; # Strip extra characters from the front of the url -CGH $url =~ s/[.,()<>{}?!]+$//; # Ditto back. $url = uf_uristr( $url ); ... } This works pretty well for my own purposes. I'd love to hear suggestions on how to make this better. Thanks, - Robert ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe