[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] R/E Question



Quoting Bill Jones (bill@fccj.org):
> ([Ee]-?mail:)?[a-zA-Z]+([._][a-zA-Z]+)*@[^ ]+
> 
> (ftp://|http://)[^ ]+
> 
> 
> Now for the question:  Your thoughts?

Well, since no-one else has mentioned it... there's no such URL scheme as
e-mail: or mail:. I think you mean mailto:.

This is the (POSIX-style) regexp I use with a hacked version of urlview
(from the people that brought you mutt!) to extract URLs from email
messages. It's far more useful than others I've encountered. The major trick
is to use a different character class for terminating characters than for
internal characters.

(([-a-z]+://|(www|ftp)[-.0-9])[a-z0-9][-a-z0-9.]+[a-z0-9](:[-0-9a-z]+)?(/([^ ><"\t]*[^ ><"\t.,;)'?!])?)?|(mailto|news|about):[^ ><"\t]*[^ ><"\t.,;)'?!])

Converting to Perl-style syntax is left as an exercise for the reader :-)

Oh yeah, it will match things like www.foo.com/bar so if you want to get
useful links from it, your code has to be ready to pre-pend http:// or
ftp:// to the matched string if necessary.

-- 
Adam Rice -- wysiwyg@glympton.airtime.co.uk -- Blackburn, Lancashire, England

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe