From: duan (Duane Rettig)

Subject: Re: [spr15461] Eq-hash tables and gc

Date: 1996-10-24 17:32


Thomas Weigert,

>> I have an application which does generate significant amounts of garbage, >> and there is no easy way around it. I also make extensive use of eq-hash >> tables. I am wondering whether these are actually a disadvantage in the >> presence of the gc. I am assuming that eq-hash tables hash on the address >> of the object, and after every gc we would have to rehash the tables. Is >> this understanding correct, and if so, are there any rules of thumb for >> deciding which type of table to use, given the suspected interaction with >> the gc. Most of my tables need to be quick in access, and are not often >> modified. >> >> Any hints are appreciated,
It is true that eq and eql hash tables require certain extra consideration, given that objects can move in the lisp heap. We've mostly mitigated this problem by providing the following algorithm for bot eq and eql hash tables: We give eq/eql hash tables three states regarding their requiring a rehash due to object movement: 1. Clean: the hash table does not need to be rehashed at all; any accesses or setfs to this hash table will not require a rehash. 2. Dirty after scavenge: Some objects in this hash table may have changed their hash codes after a scavenge, so the first access/set after the first scavenge will require a rehash prior to the operation. 3. Dirty after global-gc: This state occurs if there are no objects which might change hash codes after a scavenge, but which might change hash codes after a global-gc. We also give objects themselves one of two attributes: 1. Specially-sxhashable: objects which have hash codes embedded in them or which can be very quickly hashed without regard to the address of the object are said to be specially-sxhashable. This includes all numbers, all characters, all symbols (including nil), all function-objects, and all standard-instance objects. 2. Non-specially-sxhashable objects: Those objects that do not fall into the above category must be hashed by address. Given the above two dimensions, we can describe the hashing algorithm for eq and eql hash-tables: Suppose a hash table is empty. It is thus clean. If you add a key object to the hash table by saying (setf (gethash key ht) value), then the hash- table may become dirty based on the key: a. If the key is specially-sxhashable, the hash-table is not made dirty after the setf. (However, if the hash table was dirty at all before the setf, it does not make it clean). b. If the key is not specially-sxhashable, but has been tenured (i.e. the object is in old-space) the hash table is made dirty-after-global-gc (however, if the hash-table had been dirty-after-scavenge, it is not changed). c. If the key is not specially-sxhashable, and is new (i.e. in new-space) then the hash-table is made dirty-after-scavenge. d. If a rehash is required, then the hash-table is started out as clean, and every key item is reconsidered at the time of the rehash as if it were going in for the first time. So if a key had been new and thus caused a dirty-after-scavenge, but became tenured some time before the rehash, then at the rehash time that key item will only cause a dirty-after-global-gc (which is less drastic). Thus, whether or not an eq/eql hash table must rehash is based on the keys that are used. My best advice on maximizing efficiency in this system is: 1. If possible, use only specially-sxhashable keys. For the most part, this corresonds with the most common usages of keys, such as symbols, numbers, and CLOS instances, but unfortunately it does not include conses/lists, which are also common, but unfortunately not specially- sxhashable. 2. If #1 is not possible, try to use keys that have not just been created. If you do a (gc :tenure) after creating many keys, then they can each be added to eq/eql hash-tables without dirtying them. 3. Try to avoid global-gcs, which force the next access on many eq/eql hash-tables to require rehashes. Of course, if #1 is successful, then global-gc is not a problem. 4. If #1, #2, and #3 are not possible, then perhaps one huge rehash can be done by doing a (gc :tenure) after the table is filled up. Thereafter, one rehash will occur, and only global-gcs will cause further rehashing. I know that this is a complex answer, but we tried to hug the contours of whatever was possible, in order to get the most out of eq/eql hash-table performance. If you have any further questions, please let me know at <franz.com, at bugs> using the same subject line (with the spr number included) and I can answer them. Duane Rettig Franz Inc. http://www.franz.com/ (www) 1995 University Av Ste 275 Berkeley, CA 94704 uunet!franz!duane (uucp) Phone: (510) 548-3600; FAX: (510) 548-8253 <Franz.COM at duane> (internet)