Subject: Re: Conservative vs. precise GC
From: (Rob Warnock)
Date: Sat, 11 Jan 2003 07:48:11 -0600
Newsgroups: comp.lang.lisp
Message-ID: <>
Basile STARYNKEVITCH  <> wrote:
| Among some of the precise garbage collectors for C, there is qish
| (written by me) - see
| [which] contains a precise generational copying GC (with fixed
| finalized objects) which should be usable in various C applications
| (provided you follow some very strict coding guidelines).

See <URL:> for the "Extension
Language Toolkit" a.k.a. "Elk", a dialect of Scheme. Elk was written in C,
and Elk-3.0 [1995] contains a precise garbage collector -- actually, *two*
of them (compile time option): a classic Cheney twin-arena copying collector,
and a generational/copying GC. From "doc/cprog/":

	At the time Elk was designed [1987], conservative GC was still in
	its infancy and sufficient experience did not exist. For this reason,
	and because of the implied risks on certain machine architectures,
	the inherent portability problems, and the inability to precisely
	determine the actual memory utilization, a traditional [that is,
	precise --rpw3] GC strategy was chosen for Elk.

Like Qish, Elk required considerable discipline on the part of the
programmer writing extensions in C, but did provide some handy macros
and utilities to make life easier. E.g., here's some code (loc. cit.)
for a sample vector NREVERSE, showing how to use the GC support calls:

	Object p_vector_reverse(Object vec) {
		Object ret;
		int i, j;

		Check_Type(vec, T_Vector);
		ret = Make_Vector(VECTOR(vec)->size, False);
		for (i = 0, j = VECTOR(vec)->size; --j >= 0; i++)
			VECTOR(ret)->data[i] = VECTOR(vec)->data[j];
		return ret;

If you *never* do any GC allocation in your routine [and never call
any subroutines which do] you don't need to do this, but if you do,
you need to declare a local "GC_Node". Then use "GC_Link(foo)" to flag
"foo" as naming a place that is (temporarily) added to the root set.
[There's also "GC_Node2/GC_Link2(x,y)", 3, 4, etc., if you need to
protect more than one variable per call.] Then be sure to call "GC_Unlink"
before returning. That's pretty much it. (Not *too* painful, eh?)

Each "Object" subtype needs to provide a handful of callbacks for the GC 
so an object of that type can been scanned. This removes any restriction
on the exact format of objects (i.e., an object can freely intermix
GC'd sub-objects and raw bits), since it's the object-type callbacks
that actually scan that type, not the core GC.

I've written code in other environments using the Elk model (with a
dirt-simple copying collector written from scratch), and it's really
not too painful if, as Basile notes, the decision to use a precise
collector is made up front. Conversely, retrofitting code using a
conservative collector [such as Boehm's] to be "safe" with a precise
collector is fraught with danger, mainly due to simple clerical
errors -- overlooking a data/control flow path somewhere that needed
a "GC protect" in it.


p.s. For the terminally curious, this code:

	Object p_vector_reverse(Object x, Object y, Object z) {
		Object ret;

		GC_Link3(x, y, z);
		...something that does allocation...
		ret = ...some result... ;
		return ret;

expands into this [approximately, the actual code was slightly worse!]:

	Object p_vector_reverse(Object x, Object y, Object z) {
		Object ret;
		Object *node[5];
		extern Object *GC_List;

		node[1] = 3;		/* #pointers in this node */
		node[2] = &x;
		node[3] = &y;
		node[4] = &z;
		node[0] = GC_List;	/* save existing roots */
		GC_List = &node;	/* add node to root set */

		...something that does allocation...
		ret = ...some result... ;

		GC_List = node[0];	/* remove from root set */
		return ret;

Rob Warnock, PP-ASEL-IA		<>
627 26th Avenue			<URL:>
San Mateo, CA 94403		(650)572-2607