[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] HTML to Text Script ?



Boris Klug wrote:
> 
> Hi!
> 
> OK, its not a Mac related question, but to somebody know a perl-script
> which converts HTML to text? I setup a cron job on a unix machine which
> emails a page to a few accounts.

Use libwww-perl5
http://www.sn.no/libwww-perl/

You should get the MacPerl version from:
http://mors.gsfc.nasa.gov/MacPerl.html
ftp://mors.gsfc.nasa.gov/pub/MacPerl/Scripts/libwww-perl-5.05.sit.hqx
by Paul Schinder schinder@leprss.gsfc.nasa.gov 

>From the cookbook:
---
It is easy to convert HTML code to ``readable'' text. 

  use LWP::Simple;
  use HTML::Parse;
  print parse_html(get 'http://www.sn.no/libwww-perl/')->format;
---
it can't format table though.

You can do more by using HTML::Parser and HTML::Element. 

> So I need a script to converts HTML to Text. This means:
> 
> 1) Remove html tags
> 2) Convert entities
> 3) Convert line breaks
> 
> and maybe
> 
> 4) Convert some tags to text meanings (e.g. <B>bla</B> to _bla_)
> 5) Convert tables
>   ... and more ...
> 
> If nobody knows about such a script, I will write it on my own and post
> it to the CSPAN archive.

Cheers,
Paul

-- 
"Create like a God, Command like a King, and Work like a slave" Brancusi