Subject: Re: parsing lisp expression
From: Erik Naggum <clerik@naggum.no>
Date: 1997/12/21
Newsgroups: comp.lang.lisp
Message-ID: <3091670152628940@naggum.no>


* Tim Bradshaw
| Doing the reverse shouldn't be much harder I think (you might have toi
| hack readtables), so I see no need for perl.

this is not quite as easy as it seems.  first, whether a read object is the
first element of a list or is itself, is only defined by the _following_
character, which is at odds with how the Lisp reader works.  second, the
end of the input must be defined externally to the syntax for lists: once
we have read a token, we cannot return until the final token is either a
non-list (maybe a semicolon or a period?) or the end of the input (maybe
end of line?).  third, while the singleton list is x[], there is no concept
of an empty list in that syntax, unless such it served by some other
special value (such as `nil').  all this means that reading that kind of
list is a quite different process from the normal `read-delimited-list'.

so I threw together this to demonstrate that it is possible to write a
moderately compact parser for a "botched syntax", using the Lisp reader for
the real work:

(defun read-botched-syntax (&optional stream (eof-error-p t) (eof-value nil))
  (labels ((read-botched-list ()
	     (loop initially (read-char stream)	;discard #\[
		 until (eq (peek-char t stream) #\])
		 collect (read-botched-internal)
		 finally (read-char stream)))	;discard #\]
	   (read-botched-internal ()
	     (loop with first = (read stream)
		 for look-ahead = (peek-char t stream nil nil)
		 while look-ahead
		 while (eql look-ahead #\[)
		 do (setq first (cons first (read-botched-list)))
		 finally (return first))))
    (let ((*readtable* (copy-readtable)))
      ;; make #\[ and #\] terminate tokens.  #'identity is never called.
      (set-macro-character #\[ #'identity nil)
      (set-macro-character #\] #'identity nil)
      (if (null (peek-char t stream eof-error-p nil))
	eof-value
	(read-botched-internal)))))

this will do lots of uninspiring things if much more than simple tokens are
present in the input, so a possible replacement that tries to limit itself
to tokens would go like this:

       (read-botched-internal ()
	 (let* ((char (peek-char t stream))
		(macro-function (get-macro-character char)))
	   ;; heuristically determine whether this will be read as a token
	   ;; this works in CMUCL and Allegro CL for Unix, not in CLISP
	   (if (or (null macro-function)        ;this handles #\\
		   (eq macro-function (get-macro-character #\A)))
	     (loop with first = (read stream)
		 for look-ahead = (peek-char t stream nil nil)
		 while look-ahead
		 while (eql look-ahead #\[)
		 do (setq first (cons first (read-botched-list)))
		 finally (return first))
	     (error 'reader-error
	       :stream stream
	       :format-control "~@<Syntax error in ~S (character ~@C).~:@>"
	       :format-arguments (list stream char))))))

a more complete approach would be replacing the reader macro function for
all (relevant) characters with one's own token-reader, but that's just too
much work for now.

#\Erik
-- 
If you think this year is number 97,  |  Help fight MULE in GNU Emacs 20!
_you_ are not "Year 2000 Compliant".  |  http://sourcery.naggum.no/emacs/