Subject: Re: Could CDR-coding be on the way back?
From: Erik Naggum <erik@naggum.net>
Date: 14 Dec 2000 03:17:39 +0000
Newsgroups: comp.lang.lisp,comp.arch
Message-ID: <3185752659290095@naggum.net>

* Per Bothner <per@bothner.com>
| Perhaps, but does it make sense?

  Excuse me?  Yes, it makes sense to read Lisp data that the application
  does not know the structure of a priori.  For one thing, source _code_
  matches that description.  Data whose structure is only evident after
  having been read in as a complete expression also matches very well.

| Presumably you application has some idea of the structure of the data
| it is expecting?  (Even trees have structure.)

  Yes, it knows they will all be Lisp objects.  That's it.  This is part
  of the significant charm with Lisp's lists.

| But that is just my argument - it is *not* much less efficient.  If
| you double the array as it grows, the "average" element is only copied
| *a single time*.  (Some elements may be copied n*log(n) times, but at
| least half the elements have been copied at most once, at least
| three-quarter of the elements have been copied at most twice, etc.)

  There's something wrong with your math.  The average number of times
  an element is copied is 1.0 if you have (- (1+ (expt 2 n)) m) elements
  in the final array with m elements in the initial allocation and
  approaches 2.0 asymptotically for (1+ (expt 2 n)) elements in the
  final array, since adding the last element will copy everything and
  end up with half the allocated array empty (short that one element).

| My argument is a little of all of:
| 
| (1) I think high-level programming languages should ...
| (2) I think programmers should ...
| (3) As a consequence of (1) and (2), ...

  I'd be interested in how you _arrived_ at your preferences, instead of
  just having something follow from some random personal favorites.

| (Of course CL is object-oriented.  However, it differs from Java in
| that certain pre-defined types are used very heavily so it makes sense
| to optimize for those types in a way that makes less sense for Java.)

  This is where you're losing me.  I don't see a huge need for making a
  basic type more efficient in languages that _don't_ show a predominant
  use of that basic type.  And why are we discussing Java?  I thought it
  was already as array-friendly as you wanted languages to be?  And Lisp
  already has adjustable arrays, so what's the point in redesigning them?

| The displacement offset is not needed for adjustable arrays.  It is only
| needed for displaced arrays.

  Sigh.  Why am I wasting any time on this?  The displacement is needed
  for the "rest" (cdr) function that people who work on lists, however
  they are actually implemented, need very often.  If you want to offer
  Lisp people something in terms of efficiency, you can't take away
  their primitives without changing the entire language under them.
  That's what I was trying to get at in the last parapraph I wrote.

| And you can place the allocation size at the start of the data array.

  Yes, but that is a truly boneheaded decision as it means that the data
  array can no longer be specialized to contain a uniform data type --
  say double-floats instead of pointers to double-floats (and you really
  want that if you go the way of arrays in the first place -- people
  even ask for specialized list types, and they won't stop if they know
  the underlying implementation is array-based), but must somehow have a
  special initial element or consist of a single word that destroys
  alignment of the rest of the array.  This is just stupid.  It's just
  like the old string representation that used the first byte to hold
  the length and couldn't hold more than 255 characters per string.

| (It would be needed there anyway if memory management code needs to
| scan memory consequtively.)

  Really?  I thought memory management code followed pointers and chains
  from a root set if it were any smart.  Randomly walking over memory it
  has no idea what contains just seems like a recipe for major disaster.
  Some conservative garbage collectors that are bolted onto languages
  that were never properly designed to deal with abandoned pointers seem
  to make do with interpreting machine words as pointers on a hunch, but
  I have always been exceedingly skeptical of the entire approach.  If
  you keep the size of a vector in a well-known structure with the
  pointer to the data array, there is only one way to get at that data
  array, anyway.

| That gives only 2 words for the header meta-object.

  And much more overhead and increased complexity and general madness.
  You have obviously never actually experimented with this in _Lisp_-
  like languages.  I don't care about optimizing other languages for
  adjustable arrays, as we already have efficient adjustable arrays in
  Common Lisp, so there's no point in reinventing the wheel, and using
  adjustable arrays to represent lists requires concern for the usage of
  lists so optimized.  I'm sure you can optimize this stuff to death in
  Java, but Java doesn't have any need for cdr-coding or more efficient
  storage of lists in particular, either, so I must admit to completely
  failing to see the whole point of this exercise, except that you get
  to argue for your preference for array-based languages, which I do not
  share, and am unlikely to begin to share just because of this.

| Of course if you want to preserve identity of lists for linked links,
| you will also need an initial header cons.

  I'm sorry, but is this a real argument?  You're talking about data
  arrays that move when they grow at the tail end, right?  Lists can
  grow at the tail end without any "header cons" simply because we just
  overwrite the cdr that is nil in the final cons with a pointer to the
  cons cell that holds the new element (and a new nil).  Lists thus
  grown retain their first cons cell, which does not move, and thus
  retains identity at the pointer level.  You do need a meta-object for
  the header of arrays because the data array moves the address of the
  first element when it is moved.

| As I said:  Only twice total, on average.

  Well, you actually said *a single time*, but if you keep doubling it
  every time you say it, I'm happy.  (Sorry, but I found that fuuny. :)

| However, my preference is for languages where the programmer does not
| usually explicitly "cdr" down a list - or even index them.

  Well, I think it is preferable to ask people who prefer the languages
  they are trying to optimize for their ideas on how to optimize them
  instead of optimizing the programmers to use some language _they_ do
  not prefer simply because the optimizers do.

  I'm confused about your purposes, I guess.  Promote APL if you like,
  but then I don't see why you're even offering suggestions for how to
  implement lists efficiently in Lisp.  This is mystifying to me.

#:Erik
-- 
  The United States of America, soon a Bush league world power.  Yeee-haw!