From: james anderson

Subject: Re: XML parser and line feeds between tags

Date: 2003-12-16 12:16

it is not clear that the whitespace to which you refer should not be 
included as character data content.

were you to be referring to whitespace outside of the document element 
- the 'team' element, you would be correct. the whitespace text at 
question, on the other hand, constitutes part of the mixed content of 
the respective elements.

the passage in the document which you quoted, 
http://www.xml.com/axml/target.html#sec-white-space, specifies that the 
application must receive the whitespace. one of the deficiencies of the 
xml specifications is that nothing says what the "application" is. from 
one perspective, pxml-the-parser must supply the whitespace to 
pxml-the-modeling-application, which then duly includes it in the model.

note that, from reading though the pxml source, the :content-only t 
argument should not have any effect on the result model, as it pertains 
to things like comments, processing instructions, and declarations. if 
you look, for example, at 
http://www.xml.com/pub/a/2001/12/05/sax2.html, you will see a 
distinction between how ignorable and non-ignorable whitespace is 
reported through a sax interface, which could put an application in a 
position to ignore ignorable whitespace. it does not appear that pxml 
makes that distinction - at least not the version i have here.

there would need to be a distinct switch which specified, for example, 
significant-whitespace-only in order to achive the effect you desire. 
in which case, in any case, one would need to include an attribute to 
that effect in the respective element or establish the requiste default 
value in the document definition.


...

On Tuesday, Dec 16, 2003, at 14:02 Europe/Berlin, Laurent Eschenauer 
wrote:

> Hello everyone, > > I have an issue with the xml parser in ACL 6.2 (pxml) when using line > feeds. Looking at the XML specs, I understand that the XML parser > should > ignore line feeds and extra whitespace. However when I parse the > following > file with ACL 6.2 : > > <team> > <person id="b001" name="laurent eschenauer"/> > <person id="b002" name="cedric gauthy"/> > </team> > > Using the command :(parse-xml stream :content-only t) > > I receive: > > ((team " > " ((person id "b001" name "laurent eschenauer")) " > " ((person id "b002" name "cedric gauthy")) " > ")) > > As you can see, all line feeds are handled by the parser as token > while they should not be visible (according to the XML specs at > http://www.xml.com/axml/testaxml.htm). > > Am I missing something here ? Anyone got a similar problem ?