Subject: Re: symbol name with case preserved
From: Erik Naggum <erik@naggum.no>
Date: 18 Jan 2004 01:22:22 +0000
Newsgroups: comp.lang.lisp
Message-ID: <3283377742132730KL2065E@naggum.no>

* William J. Lamar
| Do any of the free-as-in-freedom Common Lisp implementations (CMUCL,
| SBCL, CLISP, GCL, etc.) support a way to get the name of a symbol with
| it's case preserved?

  The route from source to symbol name involves case conversion, but the
  route from symbol name to string does not.  This is important if you
  want to understand this admittedly complex issue, but it is complex
  mostly because people do not pay attention to detail.

| Although Common Lisp is a case-insensitive language,

  This is not true.

| it nevertheless would be possible for an implementation to have an
| extension function that returns a string representation of a symbol
| the way the symbol appeared in the source code.

  No, this is not possible.  Common Lisp does not retain the source from
  which it builds its internal representation.  This is actually crucial
  to understand.  Common Lisp is defined on the internal form, not on
  the character sequence that is the source.

| To help illustrate what I am describing, here is part of a session
| with CMUCL:
| 
|     * (symbol-name 'Hello)
|     "HELLO"
|     * 
| 
| What I am looking for is a function foo where
| 
|     (foo 'Hello)
| 
| would result in the string "Hello".

  Now that you understand that the reader transforms a character source
  into an internal representation and that you can never recover that
  character source, you know that you need to ask for ways to retain the
  case of the symbols that you read.

  The first, which is the simplest and which requires minimal effort to
  understand and use properly, is the multiple escape characters in the
  reader.  |Hello| is a symbol whose symbol-name is the exact character
  sequence between the ||.  When you use this notation, you communicate
  intent and both human and machine readers of the source code will know
  that case matters in this particular situation.

  The second, as has been mentioned by others, is to use READTABLE-CASE
  to modify the case conversion behavior of the Common Lisp reader.

| Currently, this means that all C++ identifier names in my Common
| Lisp code are quoted as strings.

  That should make it easy use || instead of "", and you're on your way.

  If you decide to investigate READTABLE-CASE, you wil run into a very
  annoying problem: the case of Common Lisp symbols is upper-case, which
  these days is useful mostly in news articles like this to make them
  stand out from other words, so if you ask for :PRESERVE, you get to
  shout a lot.  It would have been easy to break with the past and
  declare symbol names to be lower-case back when Common Lisp was
  defined, but They chose not to, and breaking with the standard today
  is not particularly smart.  To accomodate those who wanted to write
  their code in lower-case and still use case sensitive symbol names,
  the :INVERT case mode was defined, and this works sufficiently well.
  It may also be supported natively in an implementation that may choose
  to represent symbol-names in lower-case internally to cut down on the
  case conversion costs, which get more noticeable with larger character
  sets.

  The only central issue is what case COMMON-LISP:SYMBOL-NAME returns
  and COMMON-LISP:INTERN (etc) takes, and the key to this issue is to
  realize it is completely unrelated to the internal representation.  An
  implementation is free to offer its own |symbol-name| and |intern|
  (etc) functions in a package that it might call |common-lisp|, which
  reflects the internal representation, as long as it does the right
  thing for COMMON-LISP:SYMBOL-NAME (etc).  It might even decide to
  offer its own |readtable-case| function that swaps the meaning of
  :PRESERVE and :INVERT, and thus allow the external and internal case
  to work smoothly and effortlessly together.  It might even decide to
  cache the result of applying a function INVERT-CASE to a symbol name
  if it is requested or created via the standard functions so that the
  performance penalty would be negligible.  The problem is that :INVERT
  makes a symbol in which all characters with case have the same case,
  invert them all to the other case, while it leaves those that contain
  one character with case in each case alone.  I don't know the history
  of this decision, but I know it was painful to several parties present
  and I have no desire to re-open this wound, but let's look at what an
  implementation that wants lower-case symbol names, a case sensitive
  reader, and conformance to the standard would most intelligently do.
  It would /not/ do the inversion trick except when user code asks for
  the standard symbol name or creates a symbol through the standard
  functions, which is actually a very rare thing.  The important issue
  to take away from this discussion is that Common Lisp standard does
  not mandate that symbols are stored internally in any particular case;
  the standard only mandates what various functions accept and return.

  I think all modern Common Lisp implementations should optimize for the
  lower-case, literal symbol and should treat the upper-case symbols as
  a relic of the past that is supported via the standard functions but
  bypassed when reading and writing source code and results.  The switch
  is easily accomplished by doing the :INVERT trick in the accessors to
  symbol names first, and then gradually changing the calls to them all
  through the source code.  It will take a little while before the new
  system outperforms the old system, but the end result will be vastly
  less case conversion.  Users can prepare and encourage the whole thing
  by starting to do (setf (readtable-case ...) :invert) and the vendors
  can prepare them for the future with a package |common-lisp| that has
  variables and functions that invert the meaning of the case.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.