Subject: Re: Bootstrapping ANSI CL
From: rpw3@rpw3.org (Rob Warnock)
Date: Fri, 15 Jun 2007 03:37:52 -0500
Newsgroups: comp.lang.lisp
Message-ID: <aamdnXlNRvt90u_bnZ2dnUVZ_vCknZ2d@speakeasy.net>
Paul Wallich  <pw@panix.com> wrote:
+---------------
| Gene wrote:
| > As a practical matter, it may be easier to bootstrap a language other
| > than CL, e.g. try ANSI C to get a compiler that will compile gcc, then
| > use this to build one of the standard C-based CL systems.
| 
| How well specified is clisp's virtual machine, and how big is the set of 
| primitives/subprimitives? It seems to me that for the purpose you're 
| sort of thinking about, being able to implement a simple VM would 
| probably suffice. (ABCL might be another useful candidate, since the 
| specification of the Java virtual machines will likely be around for a 
| while.)
+---------------

CMUCL's byte codes might also be a good choice. They run in the
same VM as normal compiled-to-machine-code CMUCL functions, and
byte-compiled functions can can & be called by native-compiled
functions. If I'm reading the CMUCL source correctly, there are
only ~23 bytes codes you need to implement [though some of them
have fairly hairy semantics under the covers].

To implement the byte-code VM on bare metal, I would write a
cross-assembler in CL, then write code the interpreter for the
byte-code VM in a DSL consisting of CL macros that expanded
to assembler source [well, "source" here being one s-expr per
instruction], much like the "VOPs" in the CMUCL native compiler
itself. [You can use CMUCL's own byte-code interpreter as a
model.] Add some start-up & driver code (in the same assembler),
cross-assemble, cross-link, & copy over, and you have your "VM".

Then just use the normal CMUCL compiler to byte-compile any code
you want [such as the full CMUCL compiler itself!!], and copy the
byte-compiled FASLs across to run in your VM. [Yes, this means the
VM will also have to be taught how to read CMUCL ".bytef" FASLs.]

It's interesting that this topic is coming up at this point.
For some time now, I've actually been discussing this [building
a standalone CMUCL byte-code VM, only in C, not assembler] with
some friends as a possible way to simplify the bootstrapping of
CMUCL itself, with the goal being same one as achieved by the SBCL
project -- that is, to be able to compile the system "from scratch"
with only a C compiler for the target architecture and only "some"
running CL implementation available (on the same or a connected
machine) -- but using a rather different approach (the byte-code VM)
to get to that goal.

+---------------
| On the one hand, compiling to some kind of byte code will be
| slower (execution-wise) than compiling to bare metal...
+---------------

Yes, significantly slower. The following function:

    (defun spin (x)
      (dotimes (i x)
	(declare (fixnum i x))))

when run in CMUCL's interpreter, byte-compiled, and native-compiled
modes on an x86 architecture takes about 7400, 700, and 2 cycles per
iteration, respectively.[1]  That is, for this particular trivial
micro-benchmark, byte-compiled code is about 350 times slower than
native machine code, and raw interpreted code[2] is another 10+ times
slower than that.

But, hey, what's a mere factor of 350x amongst friends during
bootstrapping?!?  ;-}  ;-}


-Rob

[1] Note: On x86, the native-compiled inner loop for SPIN is
    these four instructions [and, yes, the "MOV ECX, EBX" could
    be hoisted outside the loop!]:

          L0:   ADD     EAX, 4
          L1:   MOV     ECX, EBX
                CMP     EAX, ECX
                JL      L0

    On Athlon CPUs [including Athlon-64 in 32-bit mode], the speed
    of this four-instruction loop depends on the memory alignment
    of the compiled code. I haven't done an extensive analysis, but
    it appears that some alignments produce the 2 CPU-cycle/iteration
    speed I report above, while some alignments yield 3 cycles/iteration.
    I have seen the speed of SPIN change from one to the other (or back)
    as a result of a GC.

[2] Well, even "interpreted" code in CMUCL is "IR1-converted" before
    being run, so that macros are only expanded once.

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607