From: o (Stephan Oepen)

Subject: tuning image size

Date: 1999-8-31 10:07

hi everybody,

i am trying to tune an Allegro CL 5.0 image.  the application is a
(constraint-based) natural language processing system with some 50
mbytes or more of static data (code, grammar, lexicon).  at run-time
the system is used to process sentences, one at a time.  parsing a
sentence can require several hundreds of mbytes in dynamic data; most
of this data (i.e. partial analyses computed so far) is active until
the parser completes processing that sentence; then everything turns
into garbage.

the reasoning i applied in choosing new- and oldspace size and a gc()
strategy is to keep all the dynamic data during a parse in newspace.
given my process size of 300 -- 500 mbytes, a global gc() can take a
minute or more; keeping dynamic data in newspace seems to come with at
least two advantages:

  - one-pass scavenges are quick; even with some 100 mbyte of active
    (dynamic) data being copied, each scavenge completes in just a few
    seconds; overall gc() efficiency is around 70 %;

  - avoiding global gc()s means that no time is wasted on inspecting
    the static data over and over again to find it is still static (i
    wish i could allocate objects forever and put them aside somehow).

my problem, however, is the following: it seems there is a hard limit
imposed on newspace size in Allegro CL.  from recent experiments, i get
the impression that it cannot exceed most-positive-fixnum (0.5 gbyte);
as my Lisp process expands newspace beyond that limit, it first reports
negative amounts of data scavenged and then dies with a fatal gsgc()
error.  in turn, resize-areas() and the build-lisp-image() :newspace
argument both fail miserably when used to request bigger newspace.  i
think this should be discussed somewhere in the documentation but could
not find it.  ideally, Allegro CL should not grow newspaces bigger than
what the garbage collector can handle.

now for the more interesting question: to circumvent this problem, i
would like to prevent growth of newspace beyond that limit and make my
application throw() out of the current computation when some amount of
dynamic data has been accumulated.  the obvious thing seems to work
this into the *gc-after-hook*: test how many bytes were scavegened at
that point and take appropriate action.  however, with the stock gc()
parameters and switches it could well be too late at this point: the
scavenge that just completed may have caused growth of newspace beyond
the (alleged) hard limit ...

another strategy that occured to me, would be to build an image with
maximal initial newspace and disable newspace growth; i expect that 
:free-bytes-new-other, :expansion-free-percent-new and others could be
used to effectively disable newspace growth.  but then, the system may
find itself unable to allocate a new object, even after a scavenge.

can Franz developers confirm my suspicion about the newspace limit?
can you recommend a strategy to avoid the problem (without having to
enable tenuring again)?  are there Allegro users that have similar
experiences with their applications?  any comments or help will be
greatly appreciated.

                                        thanks in advance  -  oe

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Stephan Oepen + Postfach 15 11 5o + 66o41 Saarbruecken + (+681) - 3o2 4176
;;;        --- <coli.uni-sb.de at oe> --- http://www.coli.uni-sb.de/~oe ---
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;