[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] HTML Entities decoding



At 16:52 -0500 9/14/99, Matthew Langford wrote:
>I have an html file which I am parsing.  It has the HTML entity —
>which Netscape shows as "--", an em dash.  When I use
>HTML::Entities::decode, I get an o with a forward-slash top (love those
>technical descriptions, don't ya?.  I think it's the same letter as typing
>option-e, o on the Mac.
>
>Can someone explain what is wrong?  It seems if Netscape knows how to
>translate properly, HTML::Entities should be able to, as well.  Is there
>another step that I'm missing?

Reference:  HTML The Definitive Guide July 1996 edition.  Used because it
was handy.

— is "Nonstandard" (and should be an en dash, with – being an em
dash).

In HTML 4, en-dash is –  (or, better, –) and em-dash is —
(or, better, —).  I don't know how far back those go.

See <http://w3c.org/TR/REC-html40/sgml/entities.html>
then search for "dash"

The numeric entities from 128 through 159 remain nonstandard (the easy
thing to do is just covert them to the byte of that value, and since they
are nonstandard, that can't be "wrong," just not very useful).

   --John
-- 
John Baxter   jwblist@olympus.net      Port Ludlow, WA, USA

===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org