Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
From: rpw3@rpw3.org (Rob Warnock)
Date: Fri, 13 Jun 2008 04:39:46 -0500
Newsgroups: comp.lang.lisp
Message-ID: <8dmdnc_m0Kv_3c_VnZ2dnUVZ_rrinZ2d@speakeasy.net>
Robert Maas, <jaycx2.3.calrobert@spamgourmet.com.remove> wrote:
+---------------
| > From: r...@rpw3.org (Rob Warnock)
| (Snipping most of your detours, keeping only the part I can use:)
| > Well, when I tried it on CMUCL-19c, when given a literal argument
| > [but much more about that below!] RANDOM didn't cons at all:
| > First & foremost, Robert forgot to *compile* ONE-ACCESS!!
| 
| But the REP, after I define a function, when I call it the first
| time, it prints out a message *saying* that's compiling it!! So I
| just assumed it was telling me the truth. I guess CMUCL lied to me
| and I believed it. Oh well.
+---------------

No, CMUCL is *not* "lying" -- you're misreading it! Consider:

    cmu> (defun slow () (dotimes (i 1000000)))

    SLOW
    cmu> (time (slow))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 
    ; [GC threshold exceeded with 13,447,000 bytes in use.  Commencing GC.]
    ; [GC completed with 1,493,040 bytes retained and 11,953,960 bytes freed.]
    ; [GC will next occur when at least 13,493,040 bytes are in use.]
    ; [GC threshold exceeded with 13,506,600 bytes in use.  Commencing GC.]
    ; [GC completed with 1,511,296 bytes retained and 11,995,304 bytes freed.]
    ; [GC will next occur when at least 13,511,296 bytes are in use.]
    ; [GC threshold exceeded with 13,524,856 bytes in use.  Commencing GC.]
    ; [GC completed with 1,503,112 bytes retained and 12,021,744 bytes freed.]
    ; [GC will next occur when at least 13,503,112 bytes are in use.]
    ; [GC threshold exceeded with 13,516,672 bytes in use.  Commencing GC.]
    ; [GC completed with 1,513,176 bytes retained and 12,003,496 bytes freed.]
    ; [GC will next occur when at least 13,513,176 bytes are in use.]

    ; Evaluation took:
    ;   3.05f0 seconds of real time
    ;   3.029525f0 seconds of user run time
    ;   5.f-6 seconds of system run time
    ;   6,760,026,224 CPU cycles
    ;   [Run times include 0.07f0 seconds GC run time]
    ;   0 page faults and
    ;   48,037,040 bytes consed.
    ; 
    NIL
    cmu> 

Did it compile the function SLOW?!?  **NO!!** It *only* compiled
the body of the TIME form! If you want the SLOW function itself
to get compiled, you have to do it yourself:

    cmu> (compile 'slow)
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    SLOW
    NIL
    NIL
    cmu> (time (slow))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.001365f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   3,011,760 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Again, the TIME form only compiled the body of the TIME form,
namely, the call to SLOW. But since the second time SLOW was now
a compiled function, things went a *lot* faster and consed nothing.

+---------------
| > But there's a more subtle problem, probably CMUCL-specific, which
| > is that ONE-ACCESS is calling (RANDOM ARRSIZ), which, despite being
| > a (near-)constant here, seems to cause consing. Compare this:
| >     cmu> (time (dotimes (i 10000) (random 268435456)))
| ..
| >     ;   0 bytes consed.                  <== NO CONSING!!
| 
| Hmm, that makes no sense to me. If there's a global integer, all
| nicely boxed as a first-class-citizen object, the allocation of it
| was already done *once* when I did the SETQ or DEFPARAMETER etc. No
| further allocation should be needed to *unbox* it, i.e. extract the
| known-fixed-precision integer within it. CMUCL is dumb!
+---------------

No, CMUCL is "too smart for its own good sometimes".  ;-}  ;-}

The issue is that if the compiler inline expander (DEFTRANSFORM RANDOM)
*doesn't* inline your RANDOM call, and your arg is not less than
KERNEL:RANDOM-FIXNUM-MAX, then a very generic version of RANDOM
[internal function %RANDOM-INTEGER] gets called which immediately
allocates a (KERNEL:RANDOM-CHUNK *RANDOM-STATE*), which is almost
always a BIGNUM [albeit a small one], hence the allocation.

+---------------
| So I need to manually copy and paste from what (expt ...) returns
| into the *SOURCE* code of my function definition before I
| READ-EVAL-PRINT-COMPILE it. (And of course I need to make sure the
| *literal* argument to RANDOM is within the bounds in the first place.)
+---------------

Well, if you're using a literal [and are on 18d or later and/or(?)
are using the Mersenne Twister RANDOM (MT19937)], then you can go
all the way up to (EXPT 2 32) without consing:

    cmu> (time (dotimes (i 1000000) (random 4294967296)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015109f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   35,257,584 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

+---------------
| (defun one-random ()
|   (the (unsigned-byte 21)
|     (random (the (unsigned-byte 21)
|               1048576)))) ;...copied from above
| (compile *)
| 
| (defun many-randoms ()
|   (dotimes (k 50000)
|     (declare (type (unsigned-byte 21) k))
|     (the (unsigned-byte 21)
|       (one-random))))
| (compile *)
+---------------

Note: You don't have to declare the types of either K or the
RANDOM call here -- the CMUCL compiler will figure those out
automatically [it has pretty good dataflow for constants and
"known functions", which includes those earlier in the same file].
That is, this gives the same non-consing results:

    (defun one-random () (random 1048576))
    (compile *)
    (defun many-randoms () (dotimes (k 50000) (one-random)))
    (compile *)
    (time (many-randoms))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.00115f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   2,535,624 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL

+---------------
| One thing that bothers me is needing to manually copy&paste the new
| parameter to RANDOM and re-define and re-compile each time I change
| the size of the array. Given the array size is the product of *two*
| arguments to RANDOM, it's too easy to make a mistake.
+---------------

If you can keep the size of the argument to RANDOM less-than-or-equal
the value of KERNEL:RANDOM-FIXNUM-MAX on your system [which is 4194303
on CMUCL-18d & later, but only 524287 on earlier versions (probably
including your 18b!)], then it still won't cons:

    (defparameter *foo* kernel:random-fixnum-max)
    (defun one-random () (random *foo*))
    (compile *)
    (defun many-randoms () (dotimes (k 50000) (one-random)))
    (compile *)
    (time (many-randoms))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.002342f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   6,231,240 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL

But if you can't keep the size of the argument to RANDOM under that
limit, and you can't use a literal, then you'll have to either fix
the bug in "cmucl/src/compiler/float-tran.lisp" (DEFTRANSFORM RANDOM)
yourself or wait for someone else to fix it.

+---------------
| I'll need to find a way to make a DEFUN-and-COMPILE factory that
| takes a parameter of array size and automatically builds the correct
| s-expression for the DEFUN and then EVALs then COMPILEs it.
+---------------

EWWWWW! Good luck with that!  :-{

+---------------
| One way is to print the s-expression to a string, substitute the literal
| text within that string, then READ-FROM-STRING to get the new
| s-expression. The amazing thing is that in Lisp that monstrosity is
| actually doable!!
+---------------

Hunh?!? "Strings? *STRINGS?!?* This is Lisp!
We don' need no stinkin' strings!"[1]

    cmu> (defun random-builder (arg)
	   (let ((*compile-print* nil))
	     (compile 'one-random `(lambda () (random ,arg)))
	     (compile 'many-randoms
		      `(lambda () (dotimes (k 50000) (one-random))))
	     nil))
    RANDOM-BUILDER
    cmu> (compile *)
    ; Compiling LAMBDA (ARG): 
    ; Compiling Top-Level Form: 

    RANDOM-BUILDER
    NIL
    NIL
    cmu> (random-builder (* 16384 16384))

    NIL
    cmu> (time (many-randoms))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.00117f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   2,579,792 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> (> (* 16384 16384) kernel:random-fixnum-max)

    T
    cmu> 


-Rob

[1] Apologies to "The Treasure of the Sierra Madre".

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607