Re: Lisp XML parser ? - Naggum cll archive

Subject: Re: Lisp XML parser ?
From: Erik Naggum <erik@naggum.no>
Date: 2000/06/23
Newsgroups: comp.lang.lisp
Message-ID: <3170708641147673@naggum.no>

* Simon Brooke <simon@jasmine.org.uk>
| H'mmmm.... I've always considered that XML syntax was just a prolix
| way of writing sexprs.

  The element structure has inherent similarities to trees made up of
  lists and the significant differences are non-obvious.

| The only problem in the representation is that XML has two distinct
| types of attribute-value pairs, one of which can only take simple
| data types as values and the other of which can take structures.
| You need some way of indicating the difference but the above scheme
| (I would have thought) would make an adequate first cut.

  I tend to represent *ML elements as if destructured with

((&rest attlist &key gi &allow-other-keys) &rest contents)

  where attlist is a keyword-value plist, at least one key in which is
  the generic identifier, a.k.a. the element type name.  (There is an
  important distinction between attributes and contents as far as
  abstraction goes, but I won't go into that.)  Attribute values have
  a restricted set of types, but I consider this an artificial, not a
  significant difference.

  One significant difference is the entity structure, which is mostly
  used for special characters, but is really an amazingly powerful and
  under-understood mechanism for organizing the input sources.  Lisp's
  syntax has nothing like it at all, and neither do other languages
  that could naturally represent tree structures.  It is non-trivial
  to represent the entity structure and the element structure side by
  side, unless you only refer to entities in attribute values.

  Another significant difference is the way identifiers are used to
  change the meaning of both the gi and the other attributes.  We are
  not used to the operator changing meaning if we change an argument,
  but this is quite common in *ML contexts, to the point where the
  generic identifier may not even name the element type as far as
  processing is concerned.  This means that the "processing key" is
  computed from the entire attribute list.  Various other mechanisms
  with similar confusability exist, and they are bad enough that you
  cannot just gloss over them.

  The result is that you cannot really represent an *ML structure
  without knowing how it is supposed to be processed, as if you would
  have to tell the Lisp reader whether you were reading for code or
  reading for data, rejecting perhaps the biggest advantage of Lisp's
  syntax.  In short: They got it all wrong.

  If they had had a less involved syntax, they wouldn't have needed
  all the arcane details and would have had fewer chances to go off
  the deep end.  Given that you can stuff a lot of junk into that
  attribute list, it just had to happen that they would do something
  harmful to themselves.  Both Perl and C++ evolved they way they did
  because of syntactic mistakes like that.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.