Re: why is this code inefficient??

Subject: Re: why is this code inefficient??
From: Erik Naggum <clerik@naggum.no>
Date: 1998/07/29
Newsgroups: comp.lang.lisp
Message-ID: <3110739880597756@naggum.no>

* kp gores
| why is this code so inefficient?  where do i waste so much space??

  presuming it is only consing that causes the waste, let me try and
  synthesize some of the other responses and add a little of my own:

|     (do
|      ((line (read-line input-file nil *eof*)(read-line input-file nil *eof*))

  READ-LINE frequently conses much more than one might think.  if you
  allocate a string buffer (henceforth called <buffer>) to write into with

(make-array <some guess>
	    :element-type 'character
	    :adjustable t
	    :fill-pointer 0)

  and fill it with some loop like

(loop initially (setf (fill-pointer <buffer>) 0)
      for character = (read-char <stream> nil nil)
      when (null character)
      do <some non-local exit>
      until (char= character #\newline)
      do (vector-push-extend character <buffer>))

  you will keep the same buffer all the time, and it will grow to
  accomodate the longest string in a space-economical fashion.

|       (spelling "")
|       (pronounciation "")
|       (word-class-code "")
|       (verb-pattern ""))

  these appear to be mere substrings into the line you just read, but you
  don't use them for anything other than to cons up new strings.  I'd do
  away with these altogether and instead use indices into the buffer.

|      (setf spelling (mytrim (subseq line 0 23))) ;; doc: 1-25

(position-if-not #'whitespace-char-p <buffer> :start 0 :end 23)
(position-if-not #'whitespace-char-p <buffer> :start 0 :end 23 :from-end t)

  will produce the starting and ending positions of a substring of buffer
  that is the same under EQUAL as your expensive solution.

|                             (nsubstitute
|                              #\Space
|                              #\,
|                              (subseq line 46 69)))) ;; doc: 51-70

  SUBSTITUTE and friends take :START and :END arguments -- use them.

|      (princ "(setf (gethash " output-file)
|      (prin1 spelling output-file)
|      (princ " " output-file)
|      (princ 'dict output-file)
|      (princ ") '(" output-file)
|      (prin1 word-class-code output-file)
|      (princ "))" output-file)

  if you are merely going to write out substrings, you can use WRITE-STRING
  and use its :START and :END arguments to delimit the substring written.
  this will require that you check for backslashes and quotation marks in
  the string written, and that might not be cost-effective.  if you can
  statically determine that they don't appear in the material, you can save
  a lot this way, however.  you can write (write-string " dict) '(" ...)
  directly, since PRINC will print the symbol-name of a symbol.

  if you want to use strings, I would instead go for displaced arrays into
  the pre-allocated buffer, and then you can do something even nicer.
  preallocate the variables with this general form:

(make-array 0
	    :adjustable t
	    :element-type 'character
	    :displaced-to <buffer>
	    :displaced-index-offset 0)

  and construct the list to be printed like this once and for all, i.e.,
  outside of the loop:

`(setf (gethash ,spelling dict) '(,word-class-code))

  now _move_ the displaced array instead of consing a new array header for
  `spelling' and `word-class-code'.

(adjust-array <array> (- <end> <start>)
	      :adustable t
              :displaced-to <buffer>
              :displaced-index-offset <start>)

  this should not cons a new array header and the value _should_ be EQ to
  the <array> argument.  (I don't find any guarantees to this effect right
  now, but it appears to have been the intention.  you might want to
  check.)  now you can print this whole list with PRINT, and it will take
  care of itself.  I don't see any reason why printing this list should
  cons.

  glue this stuff together and let me know how it works...

#:Erik
-- 
  http://www.naggum.no/spam.html is about my spam protection scheme and how
  to guarantee that you reach me.  in brief: if you reply to a news article
  of mine, be sure to include an In-Reply-To or References header with the
  message-ID of that message in it.  otherwise, you need to read that page.