Subject: Re: (read) problem with packages
From: rpw3@rpw3.org (Rob Warnock)
Date: Mon, 23 Apr 2007 22:04:52 -0500
Newsgroups: comp.lang.lisp
Message-ID: <lo-dnY03R9VJ7rDbnZ2dnUVZ_vKunZ2d@speakeasy.net>
Kent M Pitman  <pitman@nhplace.com> wrote:
+---------------
| Geoff Wozniak <geoff.wozniak@gmail.com> writes:
| > There's no interface function for readtables that returns
| > the syntax type of characters (that I know of -- did I miss it?)
| 
| No interface provided that I know of either.  But I think you can
| create one that's reasonably portable from whole cloth if you work
| at it ... it's just that the implementation technique is not even
| remotely straightforward.  ...
| 
| > ...nor is there a
| > reliable way to compare syntax types to the standard readtable.
| 
| Once you have a way of getting the syntax type, that's easy.
+---------------

Having dug into the guts of CMUCL once upon a time [when trying
(successfully, I think) to build a CL reader in C] I know that
CMUCL doesn't store "the syntax type" as a single "thing", and
that SET-SYNTAX-FROM-CHAR *doesn't* always copy all of the "minor"
attributes of a character:

    (defun set-syntax-from-char (to-char from-char &optional
					 (to-readtable *readtable*)
					 (from-readtable ()))
      "Causes the syntax of to-char to be the same as from-char in the 
      optional readtable (defaults to the current readtable).  The
      from-table defaults the standard lisp readtable by being nil."
      (let ((from-readtable (or from-readtable std-lisp-readtable)))
	;;copy from-char entries to to-char entries, but make sure that if
	;;from char is a constituent you don't copy non-movable secondary
	;;attributes (constituent types), and that said attributes magically
	;;appear if you transform a non-constituent to a constituent.
	(let ((att (get-cat-entry from-char from-readtable)))
	  (if (constituentp from-char from-readtable)
	      (setq att (get-secondary-attribute to-char)))
	  (set-cat-entry to-char att to-readtable)
	  (set-cmt-entry to-char
			 (get-cmt-entry from-char from-readtable)
			 to-readtable)))
      t)

Some explanation: In CMUCL, readtables contain a CHARACTER-MACRO-TABLE
and a separate CHARACTER-ATTRIBUTE-TABLE, which describes a few
"primary" character attributes, WHITESPACE, TERMINATING-MACRO,
ESCAPE, and CONSTITUENT, as well as a whole bunch of secondary
attributes that are related to the "primary" by having a numerical
ordering relationship. E.g.:

    > (list lisp::whitespace
	    lisp::terminating-macro
	    lisp::escape
	    lisp::constituent
	    lisp::constituent-dot
	    lisp::constituent-expt
	    lisp::constituent-slash
	    lisp::constituent-digit
	    lisp::constituent-sign
	    lisp::multiple-escape
	    lisp::package-delimiter
	    lisp::delimiter
	    lisp::constituent-decimal-digit
	    lisp::constituent-digit-or-expt
	    lisp::constituent-invalid)

    (0 1 2 3 4 5 6 7 8 10 11 12 13 14 15)
    > (lisp::character-attribute-table *readtable*)

    #(3 3 3 3 3 3 3 3 15 0 0 3 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      0 3 1 3 3 3 3 1 1 1 3 8 1 8 4 6 7 7 7 7 7 7 7 7 7 7 11 1 3 3 3 3
      3 3 3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 2 3 3 3 1
      3 3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 10 3 3 15 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3)
    >

There's also a static [initialized at startup] SECONDARY-ATTRIBUTE-TABLE
that contains such "secondary" attributes -- *which cannot be changed* --
but which (as you can see in the code above) are copied to the target
of a SET-SYNTAX-FROM-CHAR *instead* of the current attributes if the 
"from" character is a constituent:

    > lisp::secondary-attribute-table  

    #(3 3 3 3 3 3 3 3 15 15 15 3 15 15 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 15 3 3 3 3 3 3 3 3 3 3 8 3 8 4 6 7 7 7 7 7 7 7 7 7 7 11 3 3 3 3 3
      3 3 3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 10 3 3 15 3 3 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3)
    > 

These two differ in the following places:

    > (loop for cat across (lisp::character-attribute-table *readtable*)
	    and sat across lisp::secondary-attribute-table
	    and i from 0
	when (/= cat sat)
	  collect (list i (code-char i) cat sat))

    ((9 #\Tab 0 15) (10 #\Newline 0 15) (12 #\Page 0 15) (13 #\Return 0 15)
     (32 #\Space 0 15) (34 #\" 1 3) (39 #\' 1 3) (40 #\( 1 3) (41 #\) 1 3)
     (44 #\, 1 3) (59 #\; 1 3) (92 #\\ 2 3) (96 #\` 1 3))
    > 

E.g., many characters that are normally whitespace are considered
"constituent-invalid" if the character has been changed into a
constituent, and likewise many characters that are normally
terminating macro characters are considered constituents if
the character has been changed into a constituent. [At least,
I *think* that's what it means... I'm not entirely sure!!]

Anyway, this is just meant to show that extracting "the" syntax of
a character from a given implementation is sometimes... complicated.



-Rob

p.s. There's also a DISPATCH-TABLES element, "which is an alist from
dispatch characters to vectors of CHAR-CODE-LIMIT functions, for use
in defining dispatching macros", but I'm going to ignore that here.

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607