Tim Bradshaw <firstname.lastname@example.org> wrote:
| However there's a much more important issue: physics. As machines get
| faster it will become harder and harder to maintain the illusion that
| you can physically do constant-speed access to data, for large enough
| amounts of data, because it's just too far from the processor.
| Machines do amazing tricks already, but I think that for large
| problems you just have to know a lot about memory performance issues,
| and memory (and hence arrays) isn't constant-speed access at all but
| have some very complicated history-dependent model.
Ah, yez, as for example in the SGI Origin 3000 ccCUMA machine, where
(on an otherwise idle system) memory access time (for a whole cache
line on a "read miss") can range from less than 200ns to nearly 1us,
depending on how many "router hops" the access has to go over.
(It can, of course, be a *lot* worse under a heavy MP load...)
And which node a given page is on depends on a large number of
factors, including which CPU first "touched" it (caused it to
be actually allocated, as opposed to merely malloc'd) and memory
pressure from competeing jobs. (Not to mention automatic page
migration, if they have that turned on yet.)
Fun stuff. Doing repeatable HPC benchmarks is... "challenging".
Rob Warnock, PP-ASEL-IA <email@example.com>
627 26th Avenue <URL:http://www.rpw3.org/>
San Mateo, CA 94403 (650)572-2607