Subject: Re: Buggy BinarySearch implementation, real life and a curious soul...
From: rpw3@rpw3.org (Rob Warnock)
Date: Fri, 09 Jun 2006 01:19:58 -0500
Newsgroups: comp.lang.lisp
Message-ID: <r6-dnRoPa6WTjhTZnZ2dneKdnZydnZ2d@speakeasy.net>
Thomas A. Russ <tar@sevak.isi.edu> wrote:
+---------------
| rpw3@rpw3.org (Rob Warnock) writes:
| > [1] Trivia: ... [In CMUCL] fixnums take up both the #b000 and #b100
| > codepoints (for even & odd fixnums, resp.)...
| 
| So does that mean EVENP and ODDP are implemented as type tests?  ;)
+---------------

Hardly! From "cmucl/src/compiler/srctran.lisp":

    (def-source-transform evenp (x) `(zerop (logand ,x 1)))
    (def-source-transform oddp (x) `(not (zerop (logand ,x 1))))

where DEF-SOURCE-TRANSFORM is heavy internal compiler macrology...  ;-}  ;-}

From there on it depends on the compiler to optimize the generic
LOGAND+test if it can, which it can if you declare the arg fixnum:

    > (disassemble
	(compile
	  (defun foo (x)
	    (declare (fixnum x))
	    (if (evenp x)
	      123
	      456))))
    ...[chatter]...
    58946B78:       .ENTRY FOO(x)       ; (FUNCTION (FIXNUM) (OR # #))
      ...
      ...[generic arg-counting type-checking entry code]...
      ...
      42:       MOV     EAX, EBX        ; No-arg-parsing entry point
      44:       AND     EAX, 4          ; See? It knows which bit to test
      47:       TEST    EAX, EAX
      49:       JEQ     L1
      4B:       MOV     EDX, 1824       ; fixnum encoding for 456
      50: L0:   MOV     ECX, [EBP-8]    ; code for doing a return
      53:       MOV     EAX, [EBP-4]
      56:       ADD     ECX, 2
      59:       MOV     ESP, EBP
      5B:       MOV     EBP, EAX
      5D:       JMP     ECX

      5F: L1:   MOV     EDX, 492        ; fixnum encoding for 123
      64:       JMP     L0
    ...

If you replace EVENP with ODDP the code is identical except that
the values of the "MOV EDX,nnn" immediate constants get swapped.
[In terms of branch prediction, I suppose you could say the compiler
is always predicting that fixnum arguments are even.]

+---------------
| I suppose it is a clever way to get 7 type codes while still having 30
| bit integers.  Better than the naive solution of using 2 type bits and
| only getting 4 codes.
+---------------

I've seen other Lisps/Schemes in which fixnums use N-1 or N-2 bits,
pointers use N-2 or N-3 bits, and immediates chew up a byte or so
of type, leaving N-8 bits or some irregular sizes for the value.
In their docs you'll see complex type-decoding patterns like this,
from SCM's "code.doc":

        IMMEDIATE:  B,D,E,F=data bit, C=flag code, P=pointer address bit
                ................................
	pointer PPPPPPPPPPPPPPPPPPPPPPPPPPPPP000
	inum    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB10
	ichr    BBBBBBBBBBBBBBBBBBBBBBBB11110100
	iflag                   CCCCCCC101110100
	isym                    CCCCCCC001110100
		IMCAR:  only in car of evaluated code, cdr has cell's GC bit
	ispcsym                 000CCCC00CCCC100
	iloc    0DDDDDDDDDDDDDDDEFFFFFFF11111100
	gloc    PPPPPPPPPPPPPPPPPPPPPPPPPPPPP001

CMUCL's encoding is fairly tame by comparison. From "internal-design.txt"
[out-of-date, so I've updated it to reflect a change to codepoint #b110
which let them go above 32 heap object types, since heap header words were
originally tagged as "other-immediate"s (now "other-immediate-{0,1}")]:

    The following is a key of the three bit low-tagging scheme:
       000 even fixnum
       001 function pointer
       010 other-immediate-0 (header-words, etc.)
       011 list pointer
       100 odd fixnum
       101 structure pointer
       110 other-immediate-1 (chars, symbol-value trap value, etc.)
       111 other-pointer to data-blocks (other than conses, structures,
					 and functions)

So if the lispobj it's odd, it's a pointer; evens are fixnums and immediates.


-Rob

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607