Re: data structure for markup text

Subject: Re: data structure for markup text
From: Erik Naggum <erik@naggum.no>
Date: 1999/06/25
Newsgroups: comp.lang.scheme,comp.text.sgml,comp.text.xml,comp.lang.lisp
Message-ID: <3139321655035574@naggum.no>

* Norman Gray <ng3j@udcf.gla.ac.uk>
| But we seem to have that already, in the form of (La)TeX, and it seems
| to be a bad solution to the problems of document structuring for reuse
| (let us avoid the issue of `nice syntax'...).

  well, if you program in (La)TeX, I share your pain.  there's nothing in
  TeX that forbids being smart, but both the syntax and the semantics sure
  makes it a pain to write reasonably capable functions and abstractions.

| However, the system was crumbling precisely because it had the features
| Erik seems to be advocating: it's Turing complete, so authors could do
| clever things.

  one problem with TeX is the low level at which you can do clever things.
  another problem is that you _have_ be clever, because straight-forward,
  regular smartness doesn't cut it.

| I ended up feeling that the fundamental reason this is a bad solution is
| that it allows the authors to be too clever.

  I don't think society as a whole condones lobotomy just because somebody
  is considered too smart for their own good.

| If an author includes a burst of macro magic in a document -- either to
| deal with some particular problem, or just because they're bored writing
| documentation -- it's quite easy to make it effectively unprocessable in
| anything other than the original context.

  ah!  you completely misunderstand my use of "macro".  not so strange,
  considering your TeX background and my Lisp background, but let me
  clarify: a macro in Lisp is a form that takes a bunch of arguments and
  returns a _new_ form that replaces the first form.  TeX macros are more
  like functions that actually do their own stuff.  that is, a TeX macro
  implements, while a Lisp macro expresses.

  suppose you have a list of somethings and the DTD doesn't support
  defaulting or computing attributes from the last use within a particular
  element (sounds familiar? ;), you could write a macro that expands the
  "tags" of the sub-elements into having that uniform attribute.  in other
  words, you do NOT go below the language defined by the DTD, but you
  enable interesting ways to _produce_ the structure with interestingly
  defaulted or "computed" attributes.  the output of the macro is just what
  you expect it to be: valid SGML (function calls), not random noise or
  low-level magic, like TeX does.  another important point with Lisp macros
  is that they rewrite one form into another, but it's always a one-to-one
  transformation.  (if you need any other number than 1, you also need a
  generic container form, which SGML also lacks, because it breaks all
  possibility of validating the structure, unless supported by the core of
  the semantics, but SGML lacks a general "list of things" concept, too.)

  now, if you had noticed that I want tags-with-attributes to be function
  _call_ forms, obviously into the application language for which the DTD
  is a small part, you would not have confused a macro with a function.  I
  was also very explicit in making the markup execution model fundamentally
  different from the Lisp execution model, namely that Lisp is inside-out,
  while markup needs to be outside-in.

  TeX is the wrong approach, but it is not because it has computational
  power: it's because it's a glorified assembly language for formatters.

| I can't help feeling that if you restrict what an author _can_ do, then
| you limit what a future parser _has_ to do to repurpose the document, and
| that this is a good thing.

  while your empirical basis is significant in respect to TeX, I would
  recommend that you not run away scared just because you see something
  scary that reminds you of how much TeX hurts.

| Once you have constructed a grove from the input document, is it not
| then equivalent to the document expressed in Lisp,

  this doesn't seem like a valid or even likely conclusion to me.  how did
  you arrive at it?

| In other words, while the document-as-characters-plus-tags is agreeably
| restricted (from my point of view as the author of the SGML application,
| as _well_ as from my point of view as the author of the documentation --
| I want to write the words, not do clever things with the document), the
| document-as-directed-acyclic-graph-of-properties seems agreeably flexible
| and functional.

  you know, I don't think syntax is irrelevant to semantics, because syntax
  is relevant to pragmatics and if the pragmatics of clean semantics is on
  par with the pragmatics of ugly sementics, people will choose randomly,
  meaning they get it ugly 90% of the time, given human nature and all. ;)
  if you make ugly cost 10 times more than beautiful, you'll get a much
  nicer end result.  this is also why 90% of the languages are butt ugly,
  and why 90% of the code written in them is ugly.  Lisp has a tradition of
  making ugly hugely expensive and beautiful cheap, which means only 50% of
  the Lisp code is ugly, given human nature and all.

  if you haven't noticed, I'm trying very hard to invalidate your empirical
  basis from TeX and Perl and other atrocities, both syntax-, pragmatics-
  and semanticswise.  it may seem like a cliché, but people actually do
  think differently when they deal with nice things.

#:Erik
-- 
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century