Subject: Re: more questions about threads...
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 2000/04/08
Newsgroups: comp.lang.lisp
Message-ID: <8cm6mv$175tk$1@fido.engr.sgi.com>
Joe Marshall  <jmarshall@alum.mit.edu> wrote:
+---------------
| rpw3@rigden.engr.sgi.com (Rob Warnock) writes:
| > I'm sure. My guess is that at a minimum you'd want some sort of real-time
| > or at least incremental GC, so you could run the mutator on some CPUs while
| > running the collector on others, and for efficiency you'd almost certainly
| > need separate allocation pools per mutator thread, so you could do
| > lock-free allocation "most" of the time. Stuff like that...
| 
| You want a generational GC.  Each processor would get its own youngest 
| generation and be able to allocate from it without interlocking.
+---------------

Yes, I was already assuming a generational GC, certainly with a per-thread
(or per-CPU) "nursery", but perhaps even with one or two generations beyond
that being allocated per-thread/CPU. But maybe you might want some sort of
incremental collection that can run entirely in parallel with the per-thread
mutators so you don't have to lock the whole bloody mess to do a higher-
generation collection. And, gee, wouldn't it be nice if you could have
more than one of those collector threads or CPUs, too, so you could adjust
the balance between mutators (the ones that are doing the "work") and
collectors depending on the number of CPUs you had available and the kind
of problem you were solving?

If you have "enough" CPUs to throw at the problem, you could afford to
spend quite a bit of overhead in fine-grained locking (say, as might be
added by introducing read barriers), as long as by doing so you could
avoid almost entirely any *global* locking.

That's the kind of evolution we saw on SGI's "Irix" kernel as we tuned it
for larger & larger configurations. Yes, changing all the classical Unix
global locks to fine-grained locks (e.g., in the protocol stacks, replacing
global "splnet()"s with spinlocks or semaphores on individual PCBs, devices,
etc.) did in some cases hurt single-CPU performance a little. But it also
allowed the operating system to scale well to *much* larger numbers of CPUs
(as mentioned before, up to 512 so far).

That's why I was wondering if anyone had done that sort of thing for
Common Lisp yet...


-Rob

-----
Rob Warnock, 41L-955		rpw3@sgi.com
Applied Networking		http://reality.sgi.com/rpw3/
Silicon Graphics, Inc.		Phone: 650-933-1673
1600 Amphitheatre Pkwy.		PP-ASEL-IA
Mountain View, CA  94043