Subject: Re: Back to character set implementation thinking From: Erik Naggum <email@example.com> Date: Sun, 31 Mar 2002 02:59:34 GMT Newsgroups: comp.lang.lisp Message-ID: <firstname.lastname@example.org> * Brian Spilsbury | A string cannot use non-vector substrate in CL, if it were | fundamentally a sequence, they it could, as long as that substrate | satisfied sequence. As I said, we have a terminological problem here. vector and list are disjoint subclasses of sequence. string is a subclass of vector. | from memory vectors are not necessarily O(1) random access in CL, This might be at the core of your confusion. | When I say sequence, I mean the type-definition, rather than a particular | data-type. I know Common Lisp too well to understand what you mean. | Lists have support for random access implemented via sequential | accessors. Vectors have support for linear access implemented via random | accessors. No, this is really fundamentally confused. Random access _means_ O(1). Linear access means that you have a first-class pointer to each element, required to access the next. Both the cons cell and the stream satisfy the latter. | The real problem is that sequence doesn't define any iterative operators, | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support | via loop. What is "ad-hoc" about it? This is very puzzling. | I do not think that limiting yourself to a single mark/point pair, nor | keeping a mark/point in the container, where any modification propagates | side-effects, is a particularly good strategy for lisp. I think you should read what I write a little better. It is vital that mark and point are _not_ part of the string, but of the iterator. I have said as much. Please do not rudely ask me to waste my time to refute conclusions based on things I have not said. | I think it is relatively straightforward, in some encodings the amount | of state might be annoyingly large, though. Well, we just appear to have different tolerance of necessities, or you know some encodings I do not, which I kind of doubt. An example of a stateful encoding with an annoyingly large amount of state would be useful so I know where the amount becomes annoyingly large. | In the standard compression scheme for unicode you need to save | Single-Byte-Mode-P, Current-Window, and the 8 Dynamic-Window-Offsets, and | Locking-Shift-P, I've only glanced over the spec, so please excuse | omission or error. Seems pretty accurate. | The unicode SCS is pretty heavy on state, I'll agree, that's 11 words | in the most conversative form, although there are various | optimisations you could apply, I might expect to represent that in 5 | 32-bit words with packing. This is so heavy on state you want to optimize the storage? My good man, this is nothing and not worth optimizing. | The other advantage is that we don't need to store the state in the | string at all, the transitory state is kept in the iterator (ie, | dosequence, map, subseq, etc), and this means that we can share the | string freely between readers, as we currently expect to be able to. I am really curious now. You _always_ store the state in the object that modifies it, _never_ in the object it refers to. A peculiar C++ disease which I had the good fortune of discussing with a project leader who just had to vent his frustration with some of his programmers and their sheer inability to write threadsafe code precisely because they were hell-bent on "optimizing" data storage and stored the state of an iterator in the object iterated over. I wondered how anyone could even think of such an obviously boneheaded thing, but these people, he told me, were so deeply concerned with not using dynamic memory and conserving memory in general that they made this idiotic coding practice a matter of _pride_ and would therefore not consider changing it, even when ordered to fix the problem. Thread safety or, more generally, the ability to have multiple references to the same object, is the Lisp way, and being anal about memory usage is not the Lisp way. | I think that a lot of state is the exception rather than the rule. You are actually wrong about this. The ideal of statelessness is generally a very bad idea, as it tries to hide state under the rug. Generally, state can be layered, and this is good, but it is therefore exctemely important to layer it correctly. I mean, I thought this would be exceptionally obvious when we have a string-stream concept that can iterate over a string with stream operators, but you have to be explicit about setting up the these iterators. (It should have been more general, so one could iterate over the elements of a vector with read-byte.) | I also think that as shown above, we can externalise that state into | points, at an acceptable cost for reasonable encodings. I truly wonder how you could have thought that anyone would want to store the iteration state in the object iterated over. That is such a classic mistake that I am annoyed that I have to argue against it. | It may be that I am unaware of some more complex common encodings, if | there are any that you are thinking of in specific, please let me know. Try implementing a full ISO 2022 processor, try representing the device that ISO 6429 (informally known as "ANSI escape sequences") writes to, or consider the amount of state in a fully fledged MIME processor. Side- effects and modifying state is a good thing, but it must, of course, be localized with the functions that maintains the state, not with the object that is being referenced incidentally. Or maybe this is just that annoyingly stupid Object Oriented Programming thing, again, where the object itself is supposed to know something about how it is used. This is just plain bad design. Stuffing "next" pointers into a structure to build a linked list is equally nuts, but many believe this is good and cannot fathom the point of using a vector or a linked list that points to the objects in question. Such people should be kept away from computers. /// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.