Re: XML and lisp - Naggum cll archive

Subject: Re: XML and lisp
From: Erik Naggum <erik@naggum.net>
Date: Mon, 27 Aug 2001 05:14:29 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3207878053204561@naggum.net>

* Kent M Pitman <pitman@world.std.com>
> You've made allusions to alists as a way of understanding this, but as a
> sense of intuition, of course, that doesn't help a Lisp programmer a lot
> since plainly an alist is about th leftmost of each named thing, and
> people are uneasy about accessing the next-leftmost element behind
> it--that usually violates some sense of a-list/stack discipline.

  Well, this is why association lists work as a metaphor -- attributes in
  SGML/XML cannot be repeated.  If there are more keys in the remainder of
  the contents, they are not attributes.

> You haven't offered an operator whose goal is to be like
> destructuring-bind and so to get around this, so the burden seems, to
> those looking on, to be on the programmer to pick apart this structure
> manually and the set of tools seems light.  That's probably only an
> artifact of not seeing your tools, rather than anyone's belief that you
> have no such tools.

  Which tools are available for the contents?  Why are they _not_ usable
  directly for the attributes?  I fail to grasp what you want to _do_ with
  the attributes that you cannot do with them if they are sub-elements.

  You imply that people are unable to deal with sub-elements and need
  special tools to deal with attributes.  This _must_ be wrong.

> I think it would help if you posted the NML which helps you manipulate
> these, and perhaps a small code fragment that showed an end-to-end use of
> constructing an expression in Lisp and having it appear in the XML with
> this notation Boris suggests, and the reverse.  Then people would be
> talking concrete still.

  I assume that people who voice their concerns in this discussion know
  SGML.  I have no inclination to write tutorials for people who do not.
  It is a waste of my time, and I know that I will hate it.  I have about
  500 pages of a book entitled "A Conceptual Introduction to SGML" that I
  swear to whichever deity is on duty today will _never_ be published,
  because the design flaws of SGML are so pervasive that the only thing I
  want to do with them is get rid of them.  Accept the fact that I deal
  with a history of personal pain in this regard.  I invested 6 years of my
  life on SGML and related standards, and the more I worked with it, the
  more I found that SGML actively destroyed any hope of achieving what it
  had set out to do, because it is introducing several poisons into the
  conceptual processes of structuring information.  Taking a look at what
  people do with SGML and XML today has not shown _one_ case of anyone
  waking up and smelling the coffee, and it has been _burning_ in the
  coffee machine for a decade.

  This is my view: You were told that you needed attributes in addition to
  sub-element contents.  Why did you ever _agree_ to that?  The onus of
  proof is normally on he who asserts the positive, and I challenge you to
  explain to me why you _need_ attributes rather than accepting any
  challenge to explain why you do _not_ need them when what I say is that
  _you_ already know perfectly well how to deal with sub-elements.  If you
  have worked with SGML at all, you _know_ that people screw up attributes
  and sub-elements, and you _have_ to had to deal with one that should have
  been the other in your processing.  It is _impossible_ to get them
  "right" because the notion that there is a "right" solution depends on
  information that is not available at the time the distinction is made.

  Over the years, I have thought of _many_ different ways to deal with the
  colossal braindamage that is attributes in SGML.  One might think of them
  as (keyword) arguments to functions, but which other information should
  influence a "function" that deals with an element?  Well, first and
  foremost, its _parentage_.  That means that I have already had to get rid
  of the notion that <foo bar="x" zot="y"> is "really" a function call like
  (foo :bar "x" :zot "y").  It has to know _so_ much more to do _anything_
  right that it is completely useless to cast one's thinking in such terms.

  SGML must be _questioned_, not accepted as gospel or natural science
  reporting on some findings.  Somebody made a decision to add attributes,
  and I know for a fact that that was back in the days of typesetting and
  document production when the idea was that you should be able to "remove"
  the "tags" and end up with the readable text of the document as it would
  be printed.  That was the _real_ rationale for attributes.  I happen to
  think that was a briliant idea at the time -- competing markup languages
  have a serious problem in using notations that destroy the ability to
  figure out easily what it intended for human and what is intended for the
  machine.  (In particular, TeX is a monster.)  I tended towards explaining
  to people that they should not let stuff that should not be displayed be
  in sub-elements.  What a crock of shit that advice is!  As soon as GML
  became more general than producing print documents, for which it was well
  suited and still is, the attribute concept had become a mill-stone around
  its neck and it dragged it down fast.  It was _wrong_ to keep attributes
  around when their rationale had been completely eradicated from its set
  of operating conditions.  It made everything incredibly complex.  I was
  one of very few people on this planet to really _study_ the standard, and
  my brain works in such a way that I still _know_ with immediate certainty
  whether something is or is not supported by the standard language and how
  to express it.  (It works the exact same way with Common Lisp, Ada (1983,
  unfortunately :), C (1991), and any number of things I have really sat
  down to study and understand, and it is so efficient that I even get an
  emotional response to violations before I see the logic of them.)  I love
  the way my brain works, but it also has serious drawbacks: Overriding and
  updating old information is something I have to work really hard at.  The
  end result of the way I think and the way the standard is defined is that
  I immediately saw these massively complex ways to do things that "nobody"
  understood.  Take HyTime and what it calls "architectual forms" -- I
  vividly remember a long walk around a quiet Tallahassee one summer night
  with the creator of this concept, when I questioned some of the designs
  and how it would be implemented, and he was quiet for the longest time
  before he said that I was probably the first person to have understood
  what he was _really_ trying to accomplish.  That would have been _such_ a
  great thing if it had been, say, rocket science, but it was not.  It was
  a man-made complexity so great that it had required _months_ of brain-
  wracking to really get my intuition working.  That was the first time I
  had really serious doubts about the wisdom of SGML's structuring process,
  because the massive complexity of it all is _completely_ pointless and a
  result of spreading the semantics so thin that you had to keep mental
  track of an enormous number of relationships to end up with an idea of
  what something should do or mean.  It does not have to be that way.  It
  was _profoundly_ disappointing to discover that at the end of this long
  process of grasping something that looked intellectually challenging lie
  only a complexity that resulted from _rejecting_ simplicity of design at
  a few crucial points.  Hell, it still took me years to figure out what
  alternatives they _should_ have picked up, and by then it was too late.

  Now you are probably thinking "how F hard can it be?" and looking half
  condescending on a retarded monkey who cannot figure out the purposes of
  the mathematical relationships in calculus.  But it is the same problem
  we find in C++.  The question to be asked of massive complexity like that
  is not "what wonderful things did you find out that made this necessary",
  but "whatever did you _miss_ that made this so horribly complex"?  You
  can sometimes see people who are really, really dumb go about some simple
  tasks in a way that tells you that they have arrived at their ways of
  performing it through an incredibly painful process that they are loathe
  to reopen or examine at all no matter how hard it is to get it right for
  them.  Some people will construct ways of performing their job so that
  they utilize all available brainpower, simply because that is indeed a
  very satisfying feeling.  However, when it comes to grasping someone
  else's _wrong_ ideas, there is no upper bound on complexity.  Some people
  have the most bizarrely convoluted thinking processes and they completely
  fail to monitor their thinking so they traipse off into oblivion and may
  or may not come back, but if they do, it is with these spectacularly
  irrational ideas that they _love_ before they discard them.  This is the
  kind of complexity that befell the SGML community.  That I could figure
  this mess out and think about it and have something dramatic to say about
  it to the creators, frankly scares me.

  In any case, I think the core problem is that a request for a rationale
  for _removing_ a complexifying misfeature is completely bogus.  We should
  not look at what we wound up with, we should look at _how_ we wound up
  where we are.  I have explained how attributes got invented in the first
  place and it _was_ a good idea at the time.  However, as soon as elements
  got more abstract and elements could contain _no_ information that would
  wind up on the printed page, but instead other elements that would, and
  those "abstract" elements would influence the way their sub-elements'
  contents would wind up on the printed page, it should have been clear
  that the attribute concept should be scheduled for extinction because
  some of its roles had now been moved into a different realm where _all_
  of its roles could be moved without sacrificing anything.

  The core idea that went horribly wrong with SGML _because_ of the very
  sad lack of re-examination of the rationale for attributes is almost so
  fundamental that removing it will tear down everything that SGML has
  built with it.  This is likely why people resist thinking about it,
  because it was so painful to learn SGML, it is better to keep out any
  risk of having to re-experience that pain from another angle.  I shall
  probably have to repeat this core idea forever because so few people
  really grasp it: SGML claims that some things you want to say about
  something is meta-information and some things are "normal" information.
  Like the historical baggage from the characters in the file that wound up
  in print and the characters that vanished in processing, SGML's view on
  meta-information and information is that they are inherently different
  and thus not only distinguishable, but in need of being kept apart, so
  much so that there are two wildly different languages to describe them.
  This core mistake leads to an inability to move between views of your own
  information and conceptualization of its structure, and that is just the
  way to kill your information.

  As a result of this dichotomy, SGML imposes an incredibly hard structure
  on the information.  If the information wants to break out of it, the
  whole structure breaks.  (XML is really _nothing_ better, but has all the
  appeal of tooth decay the way it touts its caries as "extensibility".)
  There are so many rules in the SGML standard that effectively prohibit a
  rational way to "flex" its design that people do not refrain from it
  because they do not consider it useful to be able to, but because any
  change to a document type definition is associated with an unknowable
  increase in complexity of processing, especially in the area of bringing
  legacy documnts in line with the change.  The extreme _brittleness_ of
  the SGML structure is a direct result of the core mistake to strike a
  dichotomy between meta-information and information, because in real life,
  the two are in fact _exactly_ the same thing, it is just a matter of who
  looks at it for which purpose.  If you do not believe that, it is because
  you still think that there _has_ to be a difference.  Of course there is,
  but it is not _inherent_ or _intrinsic_ to the information, it is highly
  pragmatically determined which is which at any given time.

  Structuring information is one of the _easiest_ tasks we humans do.  All
  the time, we add meta-information to information and we do not even mark
  it up as we go.  Human languages are chock full of meta-information: "I
  did not know darkness could be so illuminating", he said, expectingly.
  We _have_ no desire to mark meta-information as such and directly because
  it is part and parcel of how we interpret what other people tell us.  If
  I say "yesterday" today, I probably mean "2001-08-26", so I could write
  <date <formal 2001-08-26> yesterday>, but I could also talk about the
  past in some general term like <date <formal past> yesterday>, and so on
  and so forth.  What we really grasp about the information we receive _is_
  invariably meta-information.  The problem is then entirely artificial,
  since we do this almost automatically.  What we really need are means to
  make the meta-information explicit.  I used to believe that this would be
  a good idea, but until we find ways to "intuit" meta-information from a
  human context, I believe it is a waste of effort and it could well be
  counterproductive.  What we need is a very limited and very practical
  approach to obtain a minimal level of meta-information.  The more we
  specify the move we exclude, because as soon as we aim for a certain
  "depth" of representation, the alternative representations at the same
  level grow exponentially in number.

///