Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
From: rpw3@rpw3.org (Rob Warnock)
Date: Fri, 20 Jun 2008 21:59:16 -0500
Newsgroups: comp.lang.lisp
Message-ID: <Yc-dnfNLKf4Z88HVnZ2dnUVZ_u-dnZ2d@speakeasy.net>
Raymond Toy (RT/EUS) <raymond.toy@ericsson.com> wrote:
+---------------
| >>>>> "Rob" == Rob Warnock <rpw3@rpw3.org> writes:
| Rob> Robert Maas, <jaycx2.3.calrobert@spamgourmet.com.remove> wrote:
| Rob> |   1945120 bytes consed.
| Rob> +---------------
| 
| Rob> See my parallel reply
|      <news:A6Odnafh_vwEdczVnZ2dnUVZ_qXinZ2d@speakeasy.net>.
| Rob> The problem isn't the value of the arg to RANDOM per se, it's
| that unless
| Rob> the arg is a literal number the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM)
| Rob> inline expander in "cmucl/src/compiler/float-tran.lisp" isn't seeing
| Rob> the arg as one which will return an (UNSIGNED-BYTE 32) or smaller, and
| Rob> thus is punting to the generic function call of RANDOM... which conses
| Rob> some 80-96 bytes per call [it varies].
| 
| I think it would help if you declared arrsiz.  As it is now, the
| compiler has no way of knowing the type of arrsiz and thus can't apply
| the transform.
+---------------

Nope, tried all kinds of type declarations -- doesn't help. The problem
is, as I said, a typo in "cmucl/src/compiler/float-tran.lisp", in the
#+RANDOM-MT19937 (DEFTRANSFORM RANDOM). Two of the COND terms need to
be swapped, namely, this code, which does a premature GIVE-UP [because
the RANDOM-FIXNUM-MAX is smaller than (EXPT 2 32), so the second branch
can never be taken!]:

      ((> num-high random-fixnum-max)
       (give-up "The range is too large to assure an accurate result."))
      #+x86
      ((< num-high (expt 2 32))
       '(values (bignum::%multiply (random-chunk (or state *random-state*))
		 num)))

Those clauses need to be swapped, viz.:

      #+x86
      ((< num-high (expt 2 32))
       '(values (bignum::%multiply (random-chunk (or state *random-state*))
		 num)))
      ((> num-high random-fixnum-max)
       (give-up "The range is too large to assure an accurate result."))

And indeed, when I compile a (DEFTRANSFORM RANDOM) patch with that swap,
I see the following behavior. First, before patching:

    cmu> (list (lisp-implementation-type) (lisp-implementation-version))

    ("CMU Common Lisp" "19c Release (19C)")
    cmu> (defvar *foo* 429496729)

    *FOO*
    cmu> (and (> *foo* kernel:random-fixnum-max)
	      (< *foo* (expt 2 32)))

    T
    cmu> (time (let ((bar *foo*))
                 (declare (type (integer 1 500000000) bar))
                 (dotimes (i 1000000)
		   (random bar))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 
    ; [GC threshold exceeded with 13,319,016 bytes in use.  Commencing GC.]
    ; [GC completed with 1,376,224 bytes retained and 11,942,792 bytes freed.]
    ; [GC will next occur when at least 13,376,224 bytes are in use.]
    ...[*lots* more of that trimmed]...
    ; Evaluation took:
    ;   0.68f0 seconds of real time
    ;   0.650193f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   1,489,031,952 CPU cycles
    ;   [Run times include 0.13f0 seconds GC run time]
    ;   0 page faults and
    ;   94,013,600 bytes consed.
    ; 
    NIL
    cmu> 

You will note that declaring the type of BAR did not reduce
the consing.

Now we try the patch:

    cmu> (load "float-tran-patch")

    ; Loading #P"/u/rpw3/float-tran-patch.x86f".
    T
    cmu> (time (let ((bar *foo*))
                 (declare (type (integer 1 500000000) bar))
                 (dotimes (i 1000000)
		   (random bar))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.01f0 seconds of real time
    ;   0.015104f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   34,939,968 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

*BINGO!* The special inline code got used in the latter case
[as one can show by compiling/disassembling the bodies of the TIME].

+---------------
| Also, Madhu has mentioned to me that random isn't very good for large
| integers near 2^32.  It uses modulo to convert the 32-bit internal
| random number to the desired range.  This produces a bias.  I think
| this is why random-integer-extra-bits and random-fixnum-max exists.  A
| rejection technique would probably be better.
+---------------

But the DEFTRANSFORM has special code on x86 platforms to inline
32-bit random numbers [that is, an arg to RANDOM of less than or
equal to 2^32]. Are you suggesting that [even after the above bug-fix]
said special-case code is in fact wrong?!?


-Rob

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607