Daniel C. Wang <email@example.com> wrote:
| Joe Marshall <firstname.lastname@example.org> writes:
| > The interesting thing is that the cost of setting up and using the
| > heap-based frame exceeds the cost of setting up, using, and
| > *deallocating* the stack-based frame.
| The real performance penalty is the loss of cache locality. A few extra
| instructions per-frame should get lost in the noise.
And the cache-locality issue can be handled (at least, in a generational
copying collector) by adopting Ungar's suggestion of having a single
*fixed* "nursery" area for allocating generation-0 objects. If the size
of this nursery area is matched to the secondary cache size of the CPU,
then it stays "hot" in the caches, just like with a conventional stack.
By the way, this is in fact exactly what happens when you use Baker's
"Cheny on the MTA" approach -- the portion of the C stack used by the
CotMTA code *is* the generation-0 allocation area.
| Also note that for standard C calling conventions you usally have to
| install a "frame-pointer" that points to the previous allocation frame.
| So in some sense heap allocation is as cheap as stack allocation on a
| per-frame basis.
*IF* you have already resigned yourself to using the C calling sequence --
but many (most?) commercial-quality Lisp & Scheme systems *don't* do that...
| In the presence of exceptions and multi-threading the stack based
| implementations become very complex, while the heap based systems remain
| almost unchanged.
Yup, which is why I personally think it's still worth considering as
an implementation strategy (along with an appropriate GC, of course).
Rob Warnock, 41L-955 email@example.com
Applied Networking http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
1600 Amphitheatre Pkwy. PP-ASEL-IA
Mountain View, CA 94043