[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Character translation problem



> That one's over my head.

OK, a bit of background.  The original ASCII character set had only
127 characters, corresponding to the values 0-127 (00-7F in hex;
00000000-01111111 in binary).  The 8th (high-order) bit of each byte,
whose value is 128 (10 in hex; 10000000 in binary) was reserved for
"parity checking", a way to ensure that transmission was accurate.

Here is a listing of this ASCII character set, generated by typing
"man ascii" on my FreeBSD machine:

...
     The hexadecimal set:

     00 nul   01 soh   02 stx   03 etx   04 eot   05 enq   06 ack   07 bel
     08 bs    09 ht    0a nl    0b vt    0c np    0d cr    0e so    0f si
     10 dle   11 dc1   12 dc2   13 dc3   14 dc4   15 nak   16 syn   17 etb
     18 can   19 em    1a sub   1b esc   1c fs    1d gs    1e rs    1f us
     20 sp    21  !    22  "    23  #    24  $    25  %    26  &    27  '
     28  (    29  )    2a  *    2b  +    2c  ,    2d  -    2e  .    2f  /
     30  0    31  1    32  2    33  3    34  4    35  5    36  6    37  7
     38  8    39  9    3a  :    3b  ;    3c  <    3d  =    3e  >    3f  ?
     40  @    41  A    42  B    43  C    44  D    45  E    46  F    47  G
     48  H    49  I    4a  J    4b  K    4c  L    4d  M    4e  N    4f  O
     50  P    51  Q    52  R    53  S    54  T    55  U    56  V    57  W
     58  X    59  Y    5a  Z    5b  [    5c  \    5d  ]    5e  ^    5f  _
     60  `    61  a    62  b    63  c    64  d    65  e    66  f    67  g
     68  h    69  i    6a  j    6b  k    6c  l    6d  m    6e  n    6f  o
     70  p    71  q    72  r    73  s    74  t    75  u    76  v    77  w
     78  x    79  y    7a  z    7b  {    7c  |    7d  }    7e  ~    7f del
...

Somewhat later, computer vendors decided to use full 8-bit bytes, using
a 9th bit (if need be) for parity.  This freed up the 8th bit for use
as data, so some vendors started encoding oddball characters in the
range 128-255 (10-FF in hex; 10000000-11111111 in binary).  Sadly, they
didn't _agree_ with each other about the character assignments, so the
Apple, Microsoft, and other encodings are NOT compatible, by and large.

So much for background.  You say you got the same results I got, as:

> e2: ’
> 27: '
> 62: b

IF you are getting exactly this, you should be able to use the code from
the $x2 case and clean up your data.  How exactly does this fail?

-r
--
Rich Morin:          rdm@cfcl.com, +1 650-873-7841, http://www.ptf.com/~rdm
Prime Time Freeware: info@ptf.com, +1 408-433-9662, http://www.ptf.com
MacPerl: http://www.macperl.com,       http://www.ptf.com/ptf/products/MPPE
MkLinux: http://www.mklinux.apple.com, http://www.ptf.com/ptf/products/MKLP

===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org