From ... From: Erik Naggum Subject: Re: string processor Date: 2000/08/10 Message-ID: <3174939482743037@naggum.net>#1/1 X-Deja-AN: 656966395 References: <39923173$1@naylor.cs.rmit.edu.au> mail-copies-to: never Content-Type: text/plain; charset=us-ascii X-Complaints-To: newsmaster@eunet.no X-Trace: oslo-nntp.eunet.no 965951946 28762 195.0.192.66 (10 Aug 2000 23:59:06 GMT) Organization: Naggum Software; vox: +47 8800 8879; fax: +47 8800 8601; http://naggum.no; http://naggum.net User-Agent: Gnus/5.0803 (Gnus v5.8.3) Emacs/20.7 Mime-Version: 1.0 NNTP-Posting-Date: 10 Aug 2000 23:59:06 GMT Newsgroups: comp.lang.lisp * fsumera@cs.rmit.edu.au (Irma) | is there any way to seperate parts of a sentence There's the function position to find the position of a single element in a sequence, and the function search to locate a subsequence. Then there's the function subseq to extract a subsequence. | (something like stringtokenizer in java) Well, instead of implementing a class like StringTokenizer with Enumerator stuff like hasMoreElements and nextElement, the customary Common Lisp approach is to return a list of tokens. Whether you wish to iterate over the returned sequence of tokens or do something else is up to you, but if so, you do it with ordinary list traversal. | for example i've got the string : "things1 : things2" | and i want to seperate them by the ':' There are basically two kinds of separators: Individual characters (such as are implemented by java.util.StringTokenizer) and strings. I tend to favor the strings model, as that allows more interesting cases. Individual characters is also a subcase of strings. There are also basically two ways to deal with separators: Either you select any from a set repeatedly (StringTokenizer default), or you match the separators in a list in order (nextElement with an argument). Then there are basically two ways to return tokens: Either your separators are noise and you discard them, or you return them as tokens in their own right (both are available, determinable by the returnTokens argument to the constructor). Finally, there are basically two ways to treat two separators back to back in the target string: Either it means there is an empty token between them, or there isn't (the StringTokenizer way). Suppose we make a random sampling of decisions and choose strings in order that are not significant but with empty elements in between. (Symbols with <> around them would be lambda-bound.) (loop for delimiter in for match-start = then next for match-end = (search delimiter :start2 match-start :end2 ) with next while match-end collect (subseq match-start match-end) do (setq next (+ match-end (length delimiter)))) would default to 0, to nil (which means the length of the target sequence). Incidentally, note that none of the code above knows it's dealing with strings, but it would be very inefficient for lists of stuff. Considering the number of options, and the ease with which you can write a loop that collects your particular substrings, this is an argument for the software-pattern mode of solution, specifically that using the general form is more complex than using the pattern. This was highlighted by a long-winded discussion on how to implement a tokenizer some time ago -- it grew ever more powerful and harder to use. Incidentally, I think java.util.StringTokenizer is silly both in its limitedness and its complexity of use, but studying Java is not a very entertaining task, except maybe if you're coming at Java from below, like from C++. | I've been using the 'read-from-string' function to get one string at | a time but i guess that must be a better way to that. You _really_ don't want to use the Lisp reader unless you have Lisp objects of some sort. Just trust me on this for now -- there's no need to figure it out the hard way. #:Erik -- If this is not what you expected, please alter your expectations.