Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
From: Erik Naggum <erik@naggum.net>
Date: Mon, 25 Mar 2002 05:06:41 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3226021614417921@naggum.net>

* Ed L Cashin <ecashin@uga.edu>
| Could you elaborate on that a bit?  I'm interested because it appears
| that you're position is that case-sensitivity in identifiers is a Bad
| Thing for programming languages.

  I consider it a bad thing to believe that A is a different character from
  a just because it has a certain "presentation property".  I mean, we do
  not distinguish characters based on font or face, underlining or color,
  and most people realize that these are incidental properties.  However,
  capitalness of a letter is just as incidental: The fact that a letter is
  capitalized depending on such randomness as the position of the word in
  the sentence is a very strong indicator that "However" and "however" are
  not different words, which is effectively what case-sensitive people
  think they are.  I tried to publish text without this incidental property
  for a while, but it seemed to tick people off even more than calling an
  idiot an idiot.

| A general principle of mine is that if things are distinguishable, they
| should not be collapsed but the distinction should be preserved whenever
| possible.  Treating different characters as the same character, or
| treating different character sequences as equivalent, should be postponed
| as long as possible in order to preserve information.

  If you use colors to distinguish keywords from identifiers in our editor,
  can you use a keyword with a different color as an identifier?

| Are you suggesting that this principle is inappropriate to apply to the
| character sequences that compose identifiers in source code?  That would
| mean that "ABLE" is the same identifier as "able".

| I must admit that when I first found out that current lisps have
| case-insensitive symbol names, I thought it reminiscent of BASIC -- kind
| of a throwback to a time when memory was much more at a premium.

  But this is not the case.  The symbol names are case-sensitive, but the
  Common Lisp reader maps all unescaped characters to uppercase by default.
  You can change this.  Symbols are in this fashion just like normal words
  in your natural language.

| (I know that Lisp predates BASIC.  I'm talking about my reaction.)  I'd
| be happy to hear a good case for case-insensitive identifiers.

  I think case sensitivity is an abuse of an incidental property.  Thus, I
  want to hear a good case for case-sensitive identifers.  Older languages
  did not have this property, but after Unix (which has a case-insensitive
  tty mode!), the norm became to distinguish case, largely because there
  were no other namespace functionality in early C.  Unix also chose to use
  lower-case commands whereas Multics had always supported case-folding.  I
  believe the reason that the Unix people wanted to distinguish case was
  that it would require an extra instruction and a lookup table that would
  waste a precious 128 bytes of memory in the kernel, while we currently
  waste an enormous amount of memory to keep case-folding tables several
  times over.  In my view, case-sensitive identifiers has become the norm
  in a community that has failed to think about proper solutions to their
  problems, but rather choose to solve only the immediate problem, much
  like C strongly encourages irrelevant micro-optimization.  So instead of
  being nice to the user, they were nice to the programmer, who did not
  have to case-fold the incomding identifiers.  I consider moving this
  burdon onto the user to be quite user-inimical and actually quite foreign
  to people who do not know the character coding standards.  I mean, do we
  have case-sensitive trademarks, even though we traditionally capitalize
  proper names?  Are Oracle and ORACLE different companies any more than
  ORACLE in red boldface 14 point Times Roman is a different company than
  ORACLE in blue italic 12 point Helvetica?

  There has definitely been "paradigm shift" in computer people's view on
  case, but not in non-computer people.  Internet protocols like SMTP use
  case-insensitive commands.  The DNS is case-insensitive.  SGML is
  case-insensitive and so is HTML.  Because of the huge problems we face
  with case-folding Unicode (which must be done with a table of some kind),
  some people have figured that we should _not_ do case-folding.  That is
  the wrong solution to the problem.  The right solution to the problem is
  to get rid of case as a character property.

  Now, assume that we no longer have different character codes for lower-
  case and upper-case letters.  Would there be any difference in how we
  look at text on computer screens, in print, etc?  No, of course not.
  Therefore, people would still be able to distinguish identifiers visually
  based on case if they want to -- just like the Common Lisp reader allows
  you to write |car| to refer to the symbol named "car", and |CAR| to refer
  to the symbol named "CAR", and just like Unix can deal with upper- and
  lower-case letters even when iuclc and olcuc is in effect with the xcase
  option by backslashing the real uppercase characters in your input.  (In
  Common Lisp, you would backslash a lower-case character in the default
  reader mode, and the printer will escape those characters that should not
  be case-folded.)  However, being able to do something and actually doing
  it are two very different things.  E.g., on TOPS-20, you could use
  lower-case letters in filenames if you really wanted to, by prefixing
  them with ^V.  Very few people bothered to do this because typing it in
  was a hassle.  I do not propose any change to how we input upper and
  lower case, but with the anal-retentive approach to saving bits, which
  has even gone so far as to write FooBarZot instead of foo-bar-zot, the
  probablity that they C freaks would have chosen case-sensitivity would be
  remarkably lower -- if we could go back and design the world over...
  
///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.