From ... From: Erik Naggum Subject: Re: Lisp XML parser ? Date: 2000/06/23 Message-ID: <3170708641147673@naggum.no>#1/1 X-Deja-AN: 637854920 References: <39523341.CED20EE@bbn.com> <3170682110777797@naggum.no> mail-copies-to: never Content-Type: text/plain; charset=us-ascii X-Complaints-To: newsmaster@eunet.no X-Trace: oslo-nntp.eunet.no 961721199 29880 195.0.192.66 (23 Jun 2000 00:46:39 GMT) Organization: Naggum Software; vox: +47 8800 8879; fax: +47 8800 8601; http://www.naggum.no User-Agent: Gnus/5.0803 (Gnus v5.8.3) Emacs/20.6 Mime-Version: 1.0 NNTP-Posting-Date: 23 Jun 2000 00:46:39 GMT Newsgroups: comp.lang.lisp * Simon Brooke | H'mmmm.... I've always considered that XML syntax was just a prolix | way of writing sexprs. The element structure has inherent similarities to trees made up of lists and the significant differences are non-obvious. | The only problem in the representation is that XML has two distinct | types of attribute-value pairs, one of which can only take simple | data types as values and the other of which can take structures. | You need some way of indicating the difference but the above scheme | (I would have thought) would make an adequate first cut. I tend to represent *ML elements as if destructured with ((&rest attlist &key gi &allow-other-keys) &rest contents) where attlist is a keyword-value plist, at least one key in which is the generic identifier, a.k.a. the element type name. (There is an important distinction between attributes and contents as far as abstraction goes, but I won't go into that.) Attribute values have a restricted set of types, but I consider this an artificial, not a significant difference. One significant difference is the entity structure, which is mostly used for special characters, but is really an amazingly powerful and under-understood mechanism for organizing the input sources. Lisp's syntax has nothing like it at all, and neither do other languages that could naturally represent tree structures. It is non-trivial to represent the entity structure and the element structure side by side, unless you only refer to entities in attribute values. Another significant difference is the way identifiers are used to change the meaning of both the gi and the other attributes. We are not used to the operator changing meaning if we change an argument, but this is quite common in *ML contexts, to the point where the generic identifier may not even name the element type as far as processing is concerned. This means that the "processing key" is computed from the entire attribute list. Various other mechanisms with similar confusability exist, and they are bad enough that you cannot just gloss over them. The result is that you cannot really represent an *ML structure without knowing how it is supposed to be processed, as if you would have to tell the Lisp reader whether you were reading for code or reading for data, rejecting perhaps the biggest advantage of Lisp's syntax. In short: They got it all wrong. If they had had a less involved syntax, they wouldn't have needed all the arcane details and would have had fewer chances to go off the deep end. Given that you can stuff a lot of junk into that attribute list, it just had to happen that they would do something harmful to themselves. Both Perl and C++ evolved they way they did because of syntactic mistakes like that. #:Erik -- If this is not what you expected, please alter your expectations.