From ... From: Erik Naggum Subject: Re: Please help!! Date: 1999/02/09 Message-ID: <3127541349478598@naggum.no> X-Deja-AN: 442359073 References: <36BCA5F9.F7B8AE61@hotmail.com> <3127376799514814@naggum.no> mail-copies-to: never Organization: Naggum Software; +47 8800 8879; http://www.naggum.no Newsgroups: comp.lang.lisp * Erik Naggum | it is generally unwise to assume that you can use an existing stack frame | when entering a function anew, even the _same_ function. to do that | reliably, you use a looping construct that has control over the variable | assignment and works _inside_ the stack frame and the known function body. * pk@kuovi.cs.tut.fi (Kellom{ki Pertti) | Could you elaborate on this? I can believe that there may be some | complications in more complicated situations, but I fail to see the | problems in compiling e.g. | | (define (last lst) | (if (null? (cdr lst)) | (car lst) | (last (cdr lst)))) | | Or are you referring to the possibility of a redefinition? no, redefinition is an additional reason to avoid optimizing away the stack frame release. let's consider a function call in all its glory. the arguments are evaluated before anything interesting happens, but we'll skip that part and just assume that all the values to pass are available. there are several ways to pass arguments to functions, but let's consider the simplest case: a stack-allocated vector. in languages with variable numbers of arguments, you need to pass the length of this vector. that's typically passed in a register, so not to complicate things, let's assume it is not part of the vector. (the stack-allocated vector is typically set up with PUSH instructions, but it helps to view it as a vector.) the actual function call is made, and this is typically very fast, like keeping the next instruction address somewhere and transfering control to the start of the new function, but in Lisp there typically needs to be some lookup to find the actual address. this is usefully handled by a trampoline -- a function that does something important but does not disturb the stack frame other than in the expected ways. (it's called a trampoline because you jump on/to it and it helps you jump elsewhere.) we have arrived at the start of the our code, where there's an prologue waiting for us. the prologue may check the argument count, do call counting statistics, check for interrupts, check for space on the stack, set up a stack frame, process the argument list (&optional with default values, &key arguments), initialize &aux variables, etc, and the order of these things is determined by a lot of factors. the interesting things to us are: error handling, the location of the argument vector, the space requirements, the initialization of lambda-bound variables. the body of the function is not relevant to us, but after the body (in time) is an epilogue that needs to clean up from the prologue, unwind the stack, and return to wherever we were called from. a function is allowed to maintain an inconsistent stack state as long as it unwinds correctly. also note that CPU's suffering from register starvation (Intel), use the stack for temporary memory in some cases, and that local variables are allocated as part of this procedure. now, to make a tail-call that does not need to unwind the stack, and which jumps to the location after the stack frame has been set up, we need to know the prologue very intimately. of course, a compiler writer does that, but we also need to make sure that everything that happens before the stack frame is set up is handled by the tail-call jump. suppose that this is a non-trivial amount of work in the general, such as computing the size of a stack-allocated &rest list and such. is it worth the trouble to recognize simple functions rather than go through the epilogue and the prologue again? and since you cannot actually optimize away the self-tail call completely in the general case, it may be a _lot_ of work to recognize the simple case in Common Lisp. now, Scheme has decided to make this decision pay off, but I do wonder whether they actually manage to make tail calls as efficient as loops. it is not possible in the general case. further complications, by the way, occur when you make a tail call from inside a let binding, and that's what we normally do. if the values are stack-allocated, the semantics of tail-call changes from a jump before the epilogue to a jump after the epilogue. a special binding prohibits the epilogue from running before the callee has returned. etc, etc. these further complications are, however, fairly easy to spot and argue with. the harder problems occur when even their absence makes it unsafe to assume you can just jump directly to the point after the prologue. now for an illustrative example. when compiled with (optimize (speed 3) (safety 0) (debug 0)), the simple function (defun foo (x y z) (list x y z)) should cause a very simple jump to the LIST function, or whatever takes care of it with three arguments. here's the SPARC disassembly with Allegro CL 4.1.3: ld [%g4 + 95], %g2 ; qlist3 jmp %g2 + 0 xor %g0, #x3, %g3 (the SPARC is explicitly pipelined, so the XOR is actually executed before the first instruction at the jump target. %G0 is a constant 0, so this sets %G3 to 3. that's the argument count register. where the other argument values are is immaterial since list is given the exact same arguments as foo got.) the Intel disassembly with ACL 5.0 is somewhat different. pushl ebp movl ebp,esp pushl esi subl esp,$44 addl esp,$12 pushl [ebp+16] ; (argument 2) pushl edx pushl eax xorl ecx,ecx movb cl,$3 call *[edi+95] ; qlist3 leave movl esi,[ebp-4] ret note that the tail-call did in fact not get merged, but consider the slightly simpler function (defun bar (x y) (list x y)) the SPARC assembly is identical to the previous function, except that QLIST2 is called instead of QLIST3. the interesting thing happens to the Intel code: xorl ecx,ecx movb cl,$2 jmp *[edi+47] ; qlist2 in other words, the number of arguments can itself be an impediment to certain optimizations, and the number may vary from architecture to architecture. your particular function compiles to a self-tail-call merge, but still goes through the symbol-function slot of the symbol naming the function. #:Erik -- Y2K conversion simplified: Januark, Februark, March, April, Mak, June, Julk, August, September, October, November, December.