Re: Wide character implementation

Subject: Re: Wide character implementation
From: Erik Naggum <erik@naggum.net>
Date: Sun, 24 Mar 2002 01:46:30 GMT
Newsgroups: comp.lang.lisp,comp.lang.scheme
Message-ID: <3225923202075012@naggum.net>

* Sander Vesik
| I also couldn't care less what you think of me.

  You should realize that only people who care a lot, make this point.

| It is pointless to think of glyph in any other way than characters - it
| should not make any difference whetever adiaresis is represented by one
| code point - the precombined one - or two.  In fact, if there is a
| detctable difference from anything dealing with text strings the
| implementation is demonstratably broken.

  It took the character set community many years to figure out the crucial
  conceptual and then practical difference between the "characteristic
  glyph" of a character and the character itself, namly that a character
  may have more than one glyph, and a glyph may represent more than one
  character.  If you work with characters as if they were glyphs, you
  _will_ lose, and you make just the kind of arguments that were made by
  people who did _not_ grasp this difference in the ISO committees back in
  1992 and who directly or indirectly caused Unicode to win over the
  original ISO 10646 design.  Unicode has many concessions to those who
  think character sets are also glyph sets, such as the presentation forms,
  but that only means that there are different times you would use
  different parts of the Unicode code space.  Some people who try to use
  Unicode completely miss this point.

  It also took some _companies_ a really long time to figure the difference
  between glyph sets and character sets.  (E.g., Apple and Xerox, and, of
  course, Microsoft has yet to reinvent the distinction badly in the name
  of "innovation", so their ISO 8859-1-like joke violates important rules
  for character sets.)  I see that you are still in the pre-enlightenment
  state of mind and have failed to grasp what Unicode does with its three
  levels.  I cannot help you, since you appear to stop thinking in order to
  protect or defend yourself or whatever (it sure looks like som mideast
  "honor" codex to me), but if you just pick up the standard and read its
  excellent introductions or even Unicode: A Primer, by Tony Graham, you
  will understand a lot more.  It does an excellent job of explaining the
  distinction between glyph and character.  I think you need it much more
  than trying to defend yourself by insulting me with your ignorance.

  Now, if you want to use or not use combining characters, you make an
  effort to convert your input to your preferred form before you start
  processing.  This isolates the "problem" to a well-defined interface, and
  it is no longer a problem in properly designed systems.  If you plan to
  compare a string with combining characters with one without them, you are
  already so confused that there is no point in trying to tell you how
  useless this is.  This means that thinking in terms of "variable-length
  characters" is prima facie evidence of a serious lack of insight _and_ an
  attitude problem that something somebody else has done is wrong and that
  you know better than everybody else.  Neither are problems with Unicode.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.