Subject: Re: generational gc and large root set
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 7 Jan 2002 03:58:16 GMT
Newsgroups: comp.lang.scheme
Message-ID: <a1b6co$1i7ha$1@fido.engr.sgi.com>
Jeffrey Siegal  <jbs@quiotix.com> wrote:
+---------------
| "Ji-Yong D. Chung" wrote:
| >     Gotcha!  I see that the solution is to make the whole
| > table into GC Heap object, that can trickle down to
| > higher generations.  In that way, it satisfies your criterion.
| 
| Exactly right, and in most Scheme programs, the symbol table is seldom
| used.  Moreover, symbol objects themselves may be seldom used, depending
| on what kind of compile-time optimization you do.
+---------------

Another view: I would claim that the whole question of GC'ing symbols
as a red herring -- what you want to avoid GC'ing are actually *global
variables*, not symbols per se. If global variables are "born tenured"
(never GC'd), and if they contain a pointer back to their corresponding
symbol (as well as a value), then only those symbols which name global
variables will be immune to collection; all other symbols (e.g., names
of local variables) are free to be collected when inaccessible. Said
another way, it is the *globals* that are part of the root set, not
the symbols per se. Hence the symbol hash table *can* be weak. Example:

	> (define foo (lambda (red-shirt-ensign) (+ red-shirt-ensign 3)))
	> (foo 4)
	7
	> (set! foo (lambda (lives-long&prospers) (+ lives-long&prospers 5)))
	> (foo 4)
	9
	>

At this point, the symbol "foo" names a global, hence will never be
collected, while the symbol "red-shirt-ensign" may be collected since
its containing procedure is inaccessible (and may also be collected).
[And "lives-long&prospers" only survives as long as "foo" isn't mutated
further.]

Of course, for a generational collector you still need a write barrier
for everything in the root set (including globals), in case they get
assigned a younger object.


-Rob

[*] The above is somewhat orthogonal from the issue of retaining
local variable names of compiled procedures for debugging purposes
(e.g., if locals are replaced with, #<display_level,offset> references),
since you only need the debugging info as long as the compiled procedure
is accessible. Ditto the "inferred names" of procedures (such as MzScheme
uses: <URL:http://www.cs.rice.edu/CS/PLT/packages/doc/mzscheme/node85.htm>).

-----
Rob Warnock, 30-3-510		<rpw3@sgi.com>
SGI Network Engineering		<http://www.meer.net/~rpw3/>
1600 Amphitheatre Pkwy.		Phone: 650-933-1673
Mountain View, CA  94043	PP-ASEL-IA

[Note: aaanalyst@sgi.com and zedwatch@sgi.com aren't for humans ]