Subject: Re: List diagrams -- Siebel and Touretzky draw them differently
From: rpw3@rpw3.org (Rob Warnock)
Date: Mon, 19 May 2008 21:25:23 -0500
Newsgroups: comp.lang.lisp
Message-ID: <ONSdnaxOYOgOq6_VnZ2dnUVZ_rzinZ2d@speakeasy.net>
Rainer Joswig  <joswig@lisp.de> wrote:
+---------------
| Lisp has two different types of objects:
| * objects that have an identity (cons cells, arrays, ...)
| * objects where the 'same' object is possibly not the
|   same in memory. numbers are such a case. the number
|   5 is usually not represented by one object for the number 5
|   and everywhere this number is used there is a pointer to this number.
|   That's not the case.
+---------------

[Aside: SIOD Scheme is an interesting mix of these, since while all
numbers are heap-allocated, a few small integers (typically 2048) are
allocated only once at startup and "interned", as it were, so that
*any* subsequent arithmetic operation which results in one of those
"interned" values gets a pointer to the "interned" (shared) instance
of that value, whereas all other result values are freshly allocated
on the heap each time they are generated as a arithmetic result.
In CL syntax (and assuming 2048 such "inums" were defined), executing
READ-FROM-STRING("(0 1 2046 2047 2048 2048 2049 2049)") would only
allocate four new integers -- but two copies each of 2048 & 2049!]

+---------------
| So sometimes a Lisp system can optimize the storage of an object
| like a number such that it stores the object directly into the slot
| of the datastructure.
| But I don't think this is usually done for cons cells on the heap.
+---------------

Actually, for fixnums & chars [and perhaps also for some other "small"
types like NIL, <unbound>, or <GC-fwd-ptr>] it almost always *IS* done!!
The trick is that the type is encoded into the same amount of space
as a Lisp pointer, and distinguished from a pointer with a few "tag"
bits stolen from the lowest (usually) few bits of the "pointer".[1]

Since such "immediate" values takes the same space as a pointer,
they are stored into other Lisp objects -- including cons cells --
exactly the same as actual pointers to heap objects.


-Rob

[1] For byte-addressed machines, if one chooses to force heap objects
    to be allocated & aligned on (say) 8-byte boundaries [which is how
    big a cons cell would be for a 32-bit machine], then it's convenient
    to allocate the least-significant 3 bits of each Lisp value to the
    "lowtag", since *if* it's a pointer-type value than the machine
    address of the associated heap object can always be found by masking
    off (or subtracting out "for free" using address arithmetic in the
    instruction set) the low three bits. E.g., CMUCL currently uses
    these lowtags [note that fixnums use *two* lowtag codepoints, which
    allows fixnums to be signed 30-bit numbers instead of 29-bit],:

    000 even fixnum
    001 function pointer
    010 other-immediate-0 (header-words, chars, symbol-value trap value, etc.)
    011 list pointer (cons cell or NIL)
    100 odd fixnum
    101 CLOS instance or structure pointer
    110 other-immediate-1 (overflow from other-immediate-0)
    111 other-pointer (heap data-blocks other than functions, conses, or
		       structs/CLOS_objs -- symbols, strings, arrays, etc.)

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607