Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
From: rpw3@rpw3.org (Rob Warnock)
Date: Sat, 21 Jun 2008 08:15:30 -0500
Newsgroups: comp.lang.lisp
Message-ID: <fI6dnZtOUvVvY8HVnZ2dnUVZ_t_inZ2d@speakeasy.net>
Madhu <enometh@meer.net> wrote:
+---------------
| Rob Warnock <rpw3@rpw3.org> wrote:
| | Nope, tried all kinds of type declarations -- doesn't help. The
| | problem is, as I said, a typo in "cmucl/src/compiler/float-tran.lisp",
| | in the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM). Two of the COND terms
| | need to be swapped, namely, this code, which does a premature GIVE-UP
| | [because the RANDOM-FIXNUM-MAX is smaller than (EXPT 2 32), so the
| | second branch can never be taken!]:
| [snip]
| |     cmu> (defvar *foo* 429496729)
| |
| |     *FOO*
| |     cmu> (and (> *foo* kernel:random-fixnum-max)
| | 	      (< *foo* (expt 2 32)))
| |
| |     T
| 
| Ah but youve chosen *foo* FIXNUM.
| (fixnump *foo*) => T
+---------------

True enough, though it *was* bigger than KERNEL:RANDOM-FIXNUM-MAX,
which was all I really needed to show that the patch was working.  ;-}

+---------------
| I think The no-consing behaviour you observe [after patching
| compiler/float-tran.lisp] is not because of 32bit inlining but
| because your values lie in the fixnum range.
+---------------

Yes, but no, not entirely. Yes, if you pick a number larger than
a fixnum (but 2^32 or smaller) and type the intermediate variable
with (INTEGER 1 4294967296), you do still get lots of consing
[~168 bytes per iteration], presumably because it uses bignum
arithmetic to get it into a register. The same thing happens
if you type the intermediate variable with (UNSIGNED-BYTE 32),
but to a lesser degree [only ~86 bytes per iteration].

But interestingly [the "no, not entirely" case], if you pre-decrement
the argument, type the intermediate variable with (UNSIGNED-BYTE 32),
and then pass it incremented to RANDOM, the consing goes away again,
regardless of the value of *FOO* [provided, of course, that you
don't attempt values outside the range (INTEGER 1 4294967296)]:

    cmu> (load "float-tran-patch")

    ; Loading #P"/u/rpw3/float-tran-patch.x86f".
    T
    cmu> (setf *foo* (expt 2 32))       ; [already DEFVAR'd earlier]

    4294967296
    cmu> (time (let ((bar (1- *foo*))) ; so as not to violate the U32
		 (declare (type (unsigned-byte 32) bar)) 
		 (dotimes (i 1000000)
		   (random (the (unsigned-byte 32) (1+ bar))))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015088f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   34,611,940 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Or for something midway between a fixnum and 2^32:

    cmu> (setf *foo* (floor (- (expt 2 32) most-positive-fixnum) 2))

    1879048192
    cmu> (fixnump *foo*)

    NIL
    cmu> (time (let ((bar (1- *foo*)))
		 (declare (type (unsigned-byte 32) bar)) 
		 (dotimes (i 1000000)
		   (random (the (unsigned-byte 32) (1+ bar))))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015289f0 seconds of user run time
    ;   0.001162f0 seconds of system run time
    ;   37,957,864 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

I think this works because the compiler is smart enough to know
that one plus a U32 is the same as (INTEGER 1 4294967296), which is
the preferred RANDOM arg type to do the special-casing on X86 platforms.

You can also avoid the consing the following way, though I'm not sure
if it's less or more ugly than the above!! ;-}

    cmu> (time (if (= *foo* 4294967296) 
		   (dotimes (i 1000000)
		     (random 4294967296)) ; special-cased in the DEFTRANSFORM
		   (let ((bar *foo*))
		     (declare (type (integer 1 4294967295) bar))
		     (dotimes (i 1000000)
		       (random bar))))) 
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015329f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   35,832,596 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Again, the typing in the second branch lets the compiler know it
can keep BAR in a register and not need generic bignum arithmetic
on it.

+---------------
| | But the DEFTRANSFORM has special code on x86 platforms to inline
| | 32-bit random numbers [that is, an arg to RANDOM of less than or
| | equal to 2^32]. Are you suggesting that [even after the above bug-fix]
| | said special-case code is in fact wrong?!?
| 
| The issue IIUC is in using (REM (random-chunk) n) to return a random
| number in the range [0, N-1],    (as Raymond Toy pointed out:)
| 
| (random-chunk) is uniformly distributed in the range
|  [0, (expt 2 random-chunk-legth)] 
| 
| [The mersenne twister in cmucl uses a random-chunk-length = 32.]
+---------------

Actually, (random-chunk) is of type (UNSIGNED-BYTE 32), and so is is
uniformly distributed in the range [0, (expt 2 random-chunk-length)),
using half-open intervals, or [0, (1- (expt 2 random-chunk-length))]
using closed intervals.

Also, I suspect Ray may have forgotten that I was specifically talking
about the X86 platform. That (REM (RANDOM-CHUNK) N) code is under a #-X86
conditional, and won't be used. And in any event, that section of code
is only used for the case of a compile-time *constant*, and even then,
only when the constant is *not* exactly 2^32.

For the cases I was trying to fix -- the arg is a variable in [1, 2^32] --
a different section of code is used, which on the X86 platform does
(VALUES (BIGNUM::%MULTIPLY (RANDOM-CHUNK ...) ARG)) instead of the
bignum REM, where BIGNUM::%MULTIPLY is an inlined VOP that takes two
32-bit numbers and returns a 64-bit result in two 32-bit registers --
that is, doesn't cons.

Bottom line, if you stand on your head & spin around three times ;-} ;-}
you can get CMUCL on X86 to serve up pseudo-random numbers up to
and including 32 bits long without consing.


-Rob

p.s.
+---------------
| So there is a small bias in the generated numbers in the range 
| [0, (mod (expt 2 random-chunk-length) n)]
| 
| I'm not an expert, so please correct me if I'm wrong.  
+---------------

Does this bias still exist if you use the correct type for
RANDOM-CHUNK, which on X86 is (UNSIGNED-BYTE 32)??

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607