From ...
From: Erik Naggum <erik@naggum.no>
Subject: Re: Please help!!
Date: 1999/02/07
Message-ID: <3127376799514814@naggum.no>#1/1
X-Deja-AN: 441633704
References: <36BCA5F9.F7B8AE61@hotmail.com> <m2zp6rrwo7.fsf@KludgeUnix.com> 	<joswig-0702990223440001@194.163.195.67> <m2ww1uwx1h.fsf@KludgeUnix.com>
mail-copies-to: never
Organization: Naggum Software; +47 8800 8879; http://www.naggum.no
Newsgroups: comp.lang.lisp

* Steve Gonedes <sgonedes@worldnet.att.net>
| Anyway, is the following function candidate for tail-call-elimination?
| I just don't see how it is possible - even if the compiler knows no
| redefinition is going to occur.
| 
| (defun make-nums (count &optional (result ()))
|   (cond ((zerop count) result)
|      (t (make-nums (1- count) (cons count result)))))

  here's the last function call compiled with tail call merging.

	leave
	movl	ebx,[esi+50]    ; make-nums
	jmp	*[edi+39]       ; tramp-two

  and here it is without:

	movl	ebx,[esi+50]    ; make-nums
	call	*[edi+39]       ; tramp-two
	leave
	ret

  (yeah, the CPU is an Intel Pentium.  sorry about that, but at least it
  may be well known.)

  to understand this, consider that a function call pushes the instruction
  pointer on the stack, _jumps_ to the new code, which sets up a stack
  frame with an ENTER instruction, does its work, performs LEAVE to unwind
  the stack, and finally returns to the caller with a RET instruction which
  pops the return address from the stack, and _jumps_ to it.

  consider the normal call case first: every call is made within the stack
  frame set up at each call, so every call consumes a fairly large amount
  of space, typically 50 to 100 bytes.  then all it does when it returns is
  unwind the stack frame and return to the its caller, which does the same
  as many times as the function has called itself.  pretty boring.  (we can
  all be thankful that CPU's aren't self-aware or they'd just revolt.)

  instead of making a new function call, we can unwind the stack frame if
  there is nothing on it that we need (even Intel has half a register left
  that can be used to pass arguments), and then call the function with our
  caller's return address still on the stack.  all we keep around is the
  address of our caller.  everything else is still set up at every call,
  but it happens over and over in the _same_ area of memory instead of new
  memory at every call.

  some people believe that a tail call is as efficient as a loop, but they
  either don't know their CPU very well (typical of some people's arrogance
  towards their tools), and/or they believe in magic or propaganda, neither
  of which are proven reliable sources of technical information.  it is
  generally unwise to assume that you can use an existing stack frame when
  entering a function anew, even the _same_ function.  to do that reliably,
  you use a looping construct that has control over the variable assignment
  and works _inside_ the stack frame and the known function body.

  in the event that the compiler actually transforms a self-recursive tail
  call into a loop, things turn a bit different, since we're no longer
  calling the function via the variable's value (in Scheme).  to make such
  a transformation, one would have to know that no side effects are made
  during the loop, as you have pointed out.  so generally, you can't, and
  the special cases where you can are probably a lot easier done with a
  real loop, anyway, despite the attraction of syntactic recursion.

#:Erik
-- 
  Y2K conversion simplified: Januark, Februark, March, April, Mak, June,
  Julk, August, September, October, November, December.