Subject: Re: time to bring back the Lisp machines?
From: rpw3@rpw3.org (Rob Warnock)
Date: Wed, 03 May 2006 01:12:57 -0500
Newsgroups: comp.lang.lisp
Message-ID: <h6OdnT6DPfd01MXZnZ2dneKdnZydnZ2d@speakeasy.net>
Robert Swindells <rjs@fdy2.demon.co.uk> wrote:
+---------------
| There is also the FPGA in an Opteron socket that was described in
| a thread last week.
+---------------

Ex-*SPEND*-sive!! [Pardon my deliberate misspelling.]

+---------------
| You might be able to use the HyperTransport protocol to snoop
| the caches of the real processors and use this to implement
| write barriers without the overhead of having to trap to user space.
+---------------

1. I suspect the aforementioned FPGA uses plain ol' "non-coherent HT",
   which is what is used for ordinary I/O (PIOs & DMA). The cc/NUMA
   cache-coherency, on the other hand, uses the AMD "coherent HT"
   protocol, and the latter is still quite confidential, AFAIK.
   [Any given link runs in either "non-coherent" or "coherent" mode.
   HT links to I/O chips such as the Am8111, Am8131/Am8132, or the
   NVidia NForce run in "non-coherent" mode; links between Opteron
   CPUs run in "coherent" mode.] Not even NDA partners get *all* the
   details about "coherent" mode [though they do get some]. Before you
   could build an FPGA that spoke the cc/NUMA protocol, you'd have to
   succeed in some *serious* negotiating with AMD for NDA access.

   However, some people obviously have succeeded, e.g., consider
   the Horus chip from NewIsis:
   http://www.hypertransport.org/docs/tech/horus_external_white_paper_final.pdf

2. Even assuming success in #1, it's not at all clear that there is
   adequate information available in another socket to implement a GC
   write barrier. Write barriers have to do their work *NOW*, not at
   some later time when the trail has already been thoroughly muddled.
   Opertons have "write-back" caches, not "write-through", and the
   dirty data can be retired in random order.

3. And while you'll see the SNOOP broadcasts from other CPUs asking
   about *your* dirtying of a given cache line, you won't necessarily
   see what data is actually stored by *them* (which happens later
   via a potentially different path through the HT fabric).

   Worse, if the store is being done into an Opteron's local SDRAM,
   and the local memory controller knows from recent history that
   no-one else has the data cached [e.g., right after a SNOOP broadcast
   gets its responses back], then *no-one* else except that local
   Operton will ever *see* what data was stored -- which makes an
   external write barrier impossible.


-Rob

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607