Subject: Re: strings and characters
From: Erik Naggum <erik@naggum.no>
Date: 2000/03/19
Newsgroups: comp.lang.lisp
Message-ID: <3162477100853468@naggum.no>

* Gareth McCaughan <Gareth.McCaughan@pobox.com>
| I'm not Barry, but I think I can.  Provided I'm allowed to use the
| HyperSpec (which I have) rather than the Standard itself (which I don't).

  note that this all hinges on the definition of STRING, not CHARACTER.

  we all agree that character objects may have implementation-defined
  attributes.  the crux of the matter is whether strings are _required_ to
  support these implementation-defined attributes for characters stored in
  them, or is _permitted_ only to hold simple characters, i.e., characters
  that have null or no implementation-defined attributes.  sadly, nothing
  you bring up affects this crucial argument.

  there are two compelling reasons why implementation-defined attributes
  are _not_ required to be retained in strings: (1) there is special
  mention of which implementation-defined attributes are discarded when
  reading a string literal from an input stream (which apparently may
  support reading them, but nothing is indicated as to how this happens),
  and (2) historically, strings did not retain bits and fonts, so if they
  were to be supported by an implementation that conformed to CLtL1, they
  would have to be _added_ to strings, while bits and fonts were explicitly
  _removed_ from the language.

| 1. MAKE-STRING is defined to return "a string ... of the most
|    specialized type that can accommodate elements of the given
|    type".
| 
| 2. The default "given type" is CHARACTER.
| 
| 3. Therefore, MAKE-STRING with the default ELEMENT-TYPE
|    returns a string "that can accommodate elements of the
|    type CHARACTER".

  the question boils down to whether the character concept as defined in
  isolation is the same as the character concept as defined as part of a
  string.  if they are, your logic is impeccable.  if they aren't the same,
  your argument is entirely moot.  I'm arguing that the crucial clue to
  understand that there is a difference is indicated by the unique "union
  type" of strings and the phrase "or a subtype of character" which is not
  used of any other specialized array in the same way it is for strings --
  no other types permit _only_ a subtype.

  I'm arguing that an implementation is not required not to have a weaker
  character concept in strings than in isolation, i.e., that strings may
  _only_ hold a subtype of character, that implementation-defined
  attributes are defined only to exist (i.e., be non-null) in isolated
  character objects, and not in characters as stored in strings.

| Now,
| 
| 5. A "string" is defined as "a specialized vector ... whose
|    elements are of type CHARACTER or a subtype of type CHARACTER".

  _please_ note that no other specialized vector type is permitted the
  leeway that "or a subtype of" implies here.  for some bizarre reason, the
  bad imitation jerk from Harlequin thought that he could delete "of type
  CHARAACTER" since every type is a subtype of itself.  however, the key is
  that this wording effectively allows a proper subtype of character to be
  represented in strings.  a similar wording does not exist _elsewhere_ in
  the standard, signifying increased importance by this differentiation.

| 8. There is such a thing as a specialized array with elements
|    of type CHARACTER or some subtype thereof, which is capable
|    of holding arbitrary things of type CHARACTER as elements.

  this is a contradiction in terms, so I'm glad you conclude this, as it
  shows that carrying "or a subtype thereof" with you means precisely that
  the standard does not require a _single_ string type to be able to hold
  _all_ character values.  that is why string is a union type, unlike all
  other types in the language.
  
| I'd have thought that if strings were special in the kind of way you're
| saying they are, there would be some admission of the fact here.  There
| isn't.

  there is particular mention of "or a subtype of character" all over the
  place when strings are mentioned.  that's the fact you're looking for.

  however, if you are willing to use contradictions in terms as evidence of
  something and you're willing to ignore facts on purpose, there is not
  much that logic and argumentation alone can do to correct the situation.

| I have been unable to find anything in the HyperSpec that justifies this.

  again, look for "or a subtype of character" in the definition of STRING.

#:Erik