From ... From: Erik Naggum Subject: Re: Theory #51 (superior(?) programming languages) Date: 1997/01/26 Message-ID: <3063227553270898@naggum.no>#1/1 X-Deja-AN: 212224862 references: <5c5c65$9ed@news-rocq.inria.fr> <3063055314263603@naggum.no> <32E8FEC9.1F11@netright.com> <3063147946409477@naggum.no> <1997Jan2506.10.34.16383@koobera.math.uic.edu> mail-copies-to: never organization: Naggum Software; +47 2295 0313; http://www.naggum.no newsgroups: comp.arch,comp.lang.lisp,comp.lang.scheme * D. J. Bernstein | Expand the tail recursion in C, like this--- : | ---and you get a 5x speedup. That's more than twice as fast as your | proud example of fast LISP code. just to be entirely clear on this, the code in the "proud example" was posted by a fool who posted both the C and the Lisp code to debunk Lisp's speed. he was indeed very proud of his code, and he made no attempt to speed up the C code once he was shown that CMUCL outdid it with a factor of 2. you're not answering the question as asked if you change the premises they way you have done. but to show his state of mind, let me include a sentence from his message (translated): Execution time under CLISP on my old Pentium/Linux: 140 seconds. After compiling the stuff, however, I got it down to 11 seconds. he was comparing _interpreted_ code to his compiled C code! you can judge his brilliance from this alone. his refusal to acknowledge that CLISP ran byte-code was the reason I wanted to beat up his example. however, your challenge was much tougher. | Of course, it's bad to use the stack for temporary space. Allocate what | you need at the beginning, handling out-of-memory explicitly rather than | crashing, and use some obvious dynamic programming--- : | ---for another 500x speedup. How would you translate that to LISP? How | fast is the result? since both the Lisp and the C versions are very fast, I ran a million passes over each version to get accurate timing. (SunOS 4 has some odd notions of internal clock speed, measuring in 100Hz and reporting i 60Hz, thereby losing a lot of precision.) your C code used 78 s, but allocated 36M of memory and needed 7 s of system time to do so. a call to free(u) and a slight rewrite of the value-returning form increased the time to 82 s, but reduced the system time to less than .3 seconds. using `alloca' to allocate on the stack, instead, it's down to 66 s. the Lisp version uses 104 s, plus 1.1 s of garbage collection and .3 s of system time. it had consed a total of 48M. (experiments showed that zeroing a pre-allocated vector was _much_ slower than allocating fresh memory.) here's the code I used for the comparisons: /* binomial.c -*- mode: c -*- 35453.668341 */ int binomial(int n, int k) { int i, j, *u; if (k > n) return 0; i = n - k; if (k > i) k = i; if (!k) return 1; u = (int *) alloca((k + 1) * sizeof (int)); if (!u) return -1; u[0] = 1; for (i = 1; i <= n; ++i) for (j = k; j > 0; --j) u[j] += u[j - 1]; return u[k]; } /* binomial.c ends */ #| binomial.cl -*- mode: common-lisp -*- 35453.668456 |# (in-package :user) (defun binomial (n k) (declare (fixnum n k)) (locally (declare (optimize (speed 3) (safety 0) (debug 0))) (unless (and (plusp k) (plusp n)) (error "No possible result")) (when (> k n) (cerror "Evaluate (binomial ~*~D ~0@*~D) instead" "Reversed arguments in (binomial ~D ~D)?" n k) (rotatef n k)) (setq k (min k (- n k))) (when (zerop k) (return-from binomial 1)) (let* ((j 0) (u (make-array (1+ k) :initial-element 0))) (declare (fixnum j) (type (vector fixnum) u)) (setf (svref u 0) 1) (tagbody outer (setq j k) inner (incf (svref u j) (svref u (1- j))) (unless (zerop (decf j)) (go inner)) (unless (zerop (decf n)) (go outer))) (svref u k)))) #| binomial.cl ends |# there may be ways to improve this further; I don't have time to sit down and _really_ tune this code (the above was based on a couple hunches and a little experimentation, and took me about 40 minutes to tweak this into place, plus about half an hour CPU time for tests running so long at a time I could get some real work done in between.) clearly, this is an immense improvement over the previous code, a factor of 43,100! (I have been told that this is because the SPARC CPU (or my version of it) handles recursion badly due to limited register windows. I'm seeking more information on this.) #\Erik -- 1,3,7-trimethylxanthine -- a basic ingredient in quality software.