Subject: Re: Upper limits of CL
From: Erik Naggum <erik@naggum.net>
Date: Thu, 20 Jun 2002 03:33:00 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3233532779857997@naggum.net>

* Christopher Browne
| If the O.P. was actually trying to invert matrices of dimension '(20000
| 20000), then they more than likely care somewhat about the amount of memory
| physically available.

  Although I shall not speak of the efficiency issue since I have never done
  anything like this before, it is fairly obvious that all you need to do this
  is some fairly minimal amount of memory (only kilobytes in magnitude) and
  enough disk space to hold one or two copies of this matrix.  Binary I/O with
  objects of the appropriate type is a requirement, of course.  There is, also
  of course, no material difference between this and system-managed virtual
  memory, but it might not even matter that it is not random access if you are
  actually going to traverse the data sequentially.  You could even do the old
  tape-station trick and write a temporary tape/file with redundant data in
  your access pattern if this is faster than random access to disk or virtual
  memory -- there is strong empirical evidence that paging is slower than
  predictable sequential disk access, and you get a free 40G disk with every
  three pairs of sneakers these days.

  A good engineer would balance the tradeoffs and solve the problem within the
  existing resource constraints.  A theoretical computer scientist would whine
  until he got a big enough machine to implement the mathematical solution with
  the least amount of fussing about the constraints of the real world.  I know
  which one I would hire to get the job done.

  When I hear about these amazing datasets, I keep thinking that they need to
  be generated somehow.  The only applications I know of that require multiple
  gigabytes of memory (and then tens of gigabytes) are those that require
  multiple terabytes of disk space (and then easily tens of terabytes, too).  A
  friend of mine from the U of Oslo once headed the IT department in Statoil
  responsible for geological surveys.  They aquired the first homogeneous park
  of disks capable of 1 terabyte simultaneous access in Norway (they already
  had _hundreds_ og terabytes of "virtual disk space" on enormous tape systems
  that were loaded onto disk as needed).  These datasets were so huge that they
  took many months to accumulate -- they had to send physical _ships_ out to
  sea to scan the seabed, at the cost of dollar a byte, and the sheer transfer
  time from tape to disk could be a couple _days_ after the physical tapes had
  also taken several days to move from ship to the computer facility.  Then, of
  course, came the processing of these things.  Dedicated hardware costing many
  millions of dollars churned on the data for weeks on end before it finally
  spit out a colorful map of the seabed that took several days to plot on their
  monster plotters so they could hang them up on the walls, hundreds of meters
  long in all.  Later, computers became powerful enough to do virtual reality
  with real-time 3D navigation over and into the analyzed seabed.  This amazing
  feat was all done in Fortran, unsurprisingly, software costing more millions
  of dollars and that were tens of millions of lines of code developed over
  decades.  This was all worth it because it takes approximately one day of
  actual production from an oil well to pay for all the computing power
  necessary to punch the right hole in the ground.  The whole country of Norway
  can run for twice as long as it takes to pump up the oil, from the _taxation_
  on the _profits_ alone, socialist plan economy and social security and
  defense and everything.  This is not some jerkoff hobbyist whining about not
  getting a 64-bit Common Lisp -- this is big enough to invest 100 man-years to
  develop dedicated programming languages and spawning moderately large
  companies just to cater to a single industry of a _very_ small number of
  companies.  (We have no mom-and-pop oil rigs in Norway.)  It would be cheaper
  to implement a Common Lisp system of your own for this kind of operation than
  to ask someone else to do it for you.  (In an unrelated industry, that is
  just how Erlang got started.)  The morale of this long-winded example is
  this: If you really need to invert a 20,000 by 20,000 matrix of doubles, and
  it is not just some kind of academic masturbation, the memory and Common Lisp
  and 64-bit hardware and whatever else you can come up with on the IT side are
  going to be miniscule compared to the rest of the costs of the system.  Take
  meteorology -- the first Cray purchased in Norway was used to help our
  fishing industry plan their expeditions to sea.  According to legend, the
  hardware paid for itself the first week it had been in operation, but the
  cost of satellite imaging, telecommunications equipment capable of feeding
  the machine in real time, etc, dwarfed the Cray hardware by two orders of
  magnitude.  Apart from the oil, we have _weather_ in this mountainous costal
  country.  Billions of dollars have been poured into meteorological prediction
  in this country alone since computers were able to help at all.  Under such
  circumstances, I can fully sympathize with the need for more than 32-bit
  addressing and I appreciate the need for the raw computational power that
  goes into these things, but if this talk about 64-bit implementations is only
  some academic exercise, I actually find it _insulting_ to the brilliant minds
  who have had to do without it.  Besides, if you really need gigabyte upon
  gigabyte of memory and the hardware to utilize it, the only thing between you
  and satisfying that need has been money for at least 10 years.  It's not like
  it's Common Lisp's fault that you haven't had that money -- and if you had
  had it, would you have had anything left over to do anything useful with it
  over a reasonable period of time?  Like, you don't buy a 150-million-dollar
  printing press just because you got an idea about publishing a newspaper --
  you upgrade from the 75-million-dollar printing press when you approach 20
  out of 24 hours running time 7 days a week and want to not run out of hours
  of the day before the new press can be delivered and run in.

  So pardon my cynical twist, but what are you doing with that 20,000×20,000
  double-precision floating point matrix you say you need to invert _today_?
  If you answer "nutt'n, I jus kinda wondered what it'd be like, you know", you
  should be very happy that I am most likely more than 3000 miles away from
  you, or I would come over and slap you hard.

  And if you _are_ doing serious stuff of this magnitude, why do you even
  bother with run-of-the-mill Common Lisp implementations on stock hardware?
  Implement your own goddamn Common Lisp system optimized for your hardware and
  your domain-specific language and other needs.  That was -- after all -- how
  several of the Common Lisp implementations out there got started.  It wasn't
  a miracle then, and it won't be a miracle now.  Just f-ing do it.

[ If I sound grumpy, it is only because I have come across too many idiots of
  the "it can't be done" persuasion lately, the kind of managers who have an
  aquarium in their office because fifteen brains think better than one.  ]
-- 
  Guide to non-spammers: If you want to send me a business offer, please be
  specific and do not put "business offer" in the Subject header.  If it is
  urgent, do not use the word "urgent".  If you need an immediate answer,
  give me a reason, do not shout "for your immediate attention".  Thank you.