Subject: Re: reading formatted text
From: (Rob Warnock)
Date: Wed, 12 Mar 2008 07:15:46 -0500
Newsgroups: comp.lang.lisp
Message-ID: <>
William James  <> wrote:
| (Pascal J. Bourguignon) wrote:
| > But the point is that most often, formated data has a grammer that is
| > so simple that a context free parser is overkill, and a mere regexp,
| > or even just reading the stuff sequentially is enough.
| Three fields:
| He said "Stop, thief!" and collapsed.
| 88
| x,y
| As a CSV record:
| "He said ""Stop, thief!"" and collapsed.",88,"x,y"
| Is CSV like this easy to parse?

Pretty much so. Here's what I use [may need tweaking for some applications]:

    ;;; PARSE-CSV-LINE -- Parse one CSV line into a list of fields,
    ;;; stripping quotes and field-internal escape characters.
    ;;; Lexical states: '(normal quoted escaped quoted+escaped)
    (defun parse-csv-line (line)
      (when (or (string= line "")           ; special-case blank lines
		(char= #\# (char line 0)))  ; or those starting with "#"
	(return-from parse-csv-line '()))
      (loop for c across line
	    with state = 'normal
	    and results = '()
	    and chars = '() do
	(ecase state
	   (case c
	     ((#\") (setq state 'quoted))
	     ((#\\) (setq state 'escaped))
	      (push (coerce (nreverse chars) 'string) results)
	      (setq chars '()))
	     (t (push c chars))))
	   (case c
	     ((#\") (setq state 'normal))
	     ((#\\) (setq state 'quoted+escaped))
	     (t (push c chars))))
	  ((escaped) (push c chars) (setq state 'normal))
	  ((quoted+escaped) (push c chars) (setq state 'quoted)))
	   (push (coerce (nreverse chars) 'string) results) ; close open field
	   (return (nreverse results)))))

It handles your sample input:

    > (parse-csv-line (read-line))
    "He said ""Stop, thief!"" and collapsed.",88,"x,y"

    ("He said Stop, thief! and collapsed." "88" "x,y")


Rob Warnock			<>
627 26th Avenue			<URL:>
San Mateo, CA 94403		(650)572-2607