Subject: Re: data structure for markup text
From: Erik Naggum <>
Date: 1999/06/21
Newsgroups: comp.lang.scheme,comp.text.sgml,comp.text.xml,comp.lang.lisp
Message-ID: <>

| You are probably uniquely qualified to explain the inherent problems with
| SGML and XML.

  well, thank you.

| It would be a real public service if you were to write them up.

  it might be, but I have already donated several thousand hours of work to
  the SGML community and found it very hard to get anything in return on
  the investment despite the fact that other people benefited commercially
  and significantly so from my work, so I'm through with "public services".

| I know that I would like to read this and to be able to point others to
| this information to avoid costly future mistakes.

  I can tell you this: I started to write a book on SGML (working title: "A
  Conceptual Introduction to SGML") and received very serious interest from
  Prentice-Hall, but my work on that book brought to the surface all the
  problems I had found with SGML, such as the sorry fact that as a syntax,
  it is incredibly complex for its simple task and its semantic powers are
  way too simplistic to be worth the straight-jacket.  not to mention the
  then (1994) obvious future impact of HTML on publishing, but HTML is so
  amazingly stupidly designed that whatever was left of SGML's goals and of
  the representation of information for longevity completely evaporated
  after HTML 2.0 became obsolete, both in theory and in practice.  I spent
  a year or so agonizing over the fact that it would be really expensive
  for me to cut the losses and move elsewhere, but that's what I had to do,
  because I would only get even more unhappy if I didn't get out of it:
  SGML grew in the back-end of the production line, but it really has no
  value except insofar as it is able to capture meta-information, and that
  means capturing the _intent_ of some particular forms of expression, but
  what intent can you possibly discern when you "convert" a poor users's
  struggle to get something that just looks OK out of Microsoft Word into
  some DTD that was designed to be easily mapped to Word in the first
  place?  such uses of SGML are hugely expensive wastes of time.  if SGML
  (or a similarly capable system) isn't with you from the start, you should
  be happy to get something that looks OK in print.  and if you are clever
  enough to use SGML on the front-end, chances are you will develop a
  system that far surpasses SGML, anyway, and you're back to using SGML as
  formatting back-end if that's the kind of tools you have available.  in
  this process, all that SGML can offer is a syntax for delimiting tags on
  elements of contents, and some important restrictions on structuring that
  means you have be needlessly clever in the machine-generated output, such
  as using HyTime architectural forms, but by so doing, you severely limit
  the application independence of the data, and you discover that it's way
  easier to generate different SGML depending on need rather than use the
  same SGML for different uses.

  what good SGML does is in the document management arena, and sometimes, a
  straight-jacket is just what you need to keep insane people in check,
  such as those who base an entire company's document base on Microsoft's
  products and secret and uintelligible document "formats".  this aspect of
  SGML _is_ very important, but once the structuring process is in the
  works, document types are defined, etc, and the applications are written,
  where did _SGML_ go, and how much did using SGML cost on top of the very
  necessary cleanup process?  if you can't do this process without SGML, by
  all means, go for it, but if you can, SGML represents additional cost and
  no benefits that cannot be obtained cheaper and better by other means.

  a succinct summary of the lessons I have learned is a pun on the old
  advice: "know SGML, forget SGML".  or in other words, transcend syntax
  and look at the concepts that SGML affords expression of, then push
  forward and desire the concepts that SGML does not afford expression of
  and see that you're better of without SGML, but needed to understand it
  in order to move further.  if you stick with SGML, you hit your head in
  the glass ceiling and spend all your effort being clever within very
  restricted bounds -- that's the waste of time you should avoid.

  however, that said, if you find yourself comfortable within the bounds of
  what SGML makes relatively easy and desire no more, I'm not going to ask
  you to choose a better approach to a problem you don't have.  if you want
  to use the quality tools that eat SGML, design simple DTDs that capture
  structure in a natural way relative to the end result, don't try to be
  clever in the DTD design phase.  if it's hard to do in SGML, use some
  other language or tool, such as a real programming language and database

  the core problem is that neither SGML, nor HTML, nor XML actually _scale_
  or help you _evolve_, and human endeavors are wont to grow and evolve.
  longevity and stability is not to be found in solid structure, but in the
  ability to turn around effortlessly yet _without_ losing the past.  SGML
  is just as bad as any other static structure in that latter regard.

  anyway, this discussion started (as far as I got into it, anyway) with a
  desire to represent SGML-like structure in a dynamic programming language
  such as Common Lisp, and that's the next step: when your data can easily
  be interpreted as function calls, you can write programs that produce the
  desired output as you "execute" the document.  the syntax you use for
  this is not particularly important and it might as well be SGML if you
  can trivially read it into the program and process it, but SGML is such a
  pain to read and process that you should think twice about using it.  put
  another way: whether you write <foo bar=1>zot</foo> or ((foo :bar 1)
  "zot") is not important; that you think in terms of function calls,
  programming languages, and dynamic semantics is.  but why then accept all
  the bother with a truly arcane syntax?  _that's_ the question I couldn't
  answer.  watching other people struggle like mad with the syntax and not
  even _getting_ the semantic relationships and purposes was what tipped me
  off: SGML had a constructive goal, but is counter-productive in practice.
  the productive approach is to understand the constructive goal and do it
  some much less involved way.  now, if you want to do that, I'm all ears.
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century