Subject: Re: MD5 in LISP and abstraction inversions
From: Erik Naggum <erik@naggum.net>
Date: Thu, 01 Nov 2001 15:30:22 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3213617417316421@naggum.net>

* Francois-Rene Rideau <fare+NOSPAM@tunes.org>
| Ask whom?

  This newsgroup, for instance.

| If someone was willing to publish code for others to use, he'd already
| have done it.  Or at least announced it on a webpage.

  That has actually been done.  I will certainly not blame you for not
  finding things on the Net, however.  That is why asking those wetware
  search engines called "humans" is still a very good idea.

| So far, the only advertised Common LISP implementation of MD5 is Franz'
| ACL6's - which doesn't quite fulfill my needs, and is embedded in their
| application rather than available as portable common lisp.

  It is not an implementation in Common Lisp, but written in C as a
  low-level system function.  There seem to be serious performance
  improvements in the upcoming 6.1 release.

| The ACL version is about twice slower than C.

  Well, that much is fairly odd, considering that it is written in C.  I
  only noticed that it consed like mad because it used C space for a copy
  of every string, and that C space was neither freed nor garbage collected.

| No, I blame the language for not allowing me to express my wishes.

  I can sympathize with this.  I frequently blame the planet earth for
  exhibiting more gravitation than I wish it had, and I wish magic worked,
  too, because I certainly do not want to do so much _work_ all the time.

  The wishes of good engineers are subjugated to the reality in which they
  need their wishes to come true.  The wishes of really bad engineers are
  completely disconnected from any reality.  I think yours are of the
  latter kind.

| I can, certainly, open files in octet mode - mind you, that's precisely
| what I do.  But then, I'm not interoperable with all the body of
| character-based text, SEXP or (shudder) XML processing.

  You really need to clue in that MD5 does not operate on characters.  That
  some languages have no idea what a character is, but treat it like a
  small integer is really not something you can blame either MD5 or Common
  Lisp for.  If you really need a stream to exhibit characters while you do
  MD5 processing on its underlying byte stream, you can do that by creating
  a new stream class that reads from those 64-byte buffers that you read
  from the input source and give to MD5.  It is really not that hard if you
  want to _work_ with the language and avoid "wishing" for things that are
  incompatible with the language you use.

| It is certainly possible to build translation layers from one to the
| other, but it's clumsy, inefficient, not portable, and underspecified.

  MD5 is specified to work on blocks of 64 bytes.  How you get from what
  you have to 64-byte blocks is really a question of quality of programmer
  and implementation.  I still think you are simply massively incompentent.

| Certainly, each implementation specifies (more or less precisely) what
| happens when you use code-char and such, but then, you have the same kind
| of incompatible extension hell as in Scheme.

  Wrong.

| Once again, reading their specification, I happen to like the recent
| things done by Franz (SIMPLE-STREAM), but it's unhappily not directly
| applicable to me.

  Well, what makes it impossible for you to take their good ideas and
  implement them on your own?

| For performance, lack of supported modular integer operators is also a
| big problem.

  I actually agree with this in general, but for MD5, just break the 32-bit
  integers in half and operate on 16-bit values, instead.  I assure you
  that no significant performance loss is caused by this, and the algorithm
  does not increase in complexity because of it.  This is a fairly simple
  engineering tradeoff that good engineers will do to get the work done and
  bad engineers will refuse to do because they wish they did not have to.
  
| I consider it bad practice to post large code files on USENET.

  Well, if your MD5 function is a large code file, then you have even more
  problems.

| I posted the URL to that code, which ought to be enough for anyone
| interested.

  There is no way to ascertain that what is pointed to by a URL will remain
  the same after it has been criticized.  We have seen how some massively
  dishonest _frauds_ on this newsgroup have altered the text of published
  URLs (even without updating the version) in order to make their critics
  look bad.  Post the code and it will be very hard to "update" it.

| I repeat it here (and add a second one), in case you missed it:
|         ftp://Samaris.tunes.org/pub/lang/cl/fare/md5.lisp
|         http://tunes.org/cgi-bin/cvsweb/fare/fare/lisp/md5.lisp

  I find it rather odd that you cannot destill your problems down to a few
  simple cases.  Everybody can look elsewhere for the full context, but it
  _should_ be possible to show people your problems with an appropriate
  excerpt.  A good bug report contains a destilled example.  A bad bug
  report contains a million lines of rotten code with a single line "this
  code is perfect, but your compiler does not conform to my wishes".

| > | The world has standardized on low-level byte streams as the universal
| > | medium for communication of data, including text.
| >   Which is a mistake, since they run into an enormous amount of trouble
| >   with supporting more than one character encoding.
| It is not a mistake - it is the natural thing to do in a world of
| proprietary black-box devices and software.

  Huh?  You have a special knack for non sequiturs.  TEXT IS NOT BYTES.
  You _really_ need to understand this.  Failing that, you will run into
  all sorts of problems, and there will be no end to your complaints, as I
  suspect you have already noticed.
  
| I did it.  But it's not portably interoperable with the character-based
| SEXP code that I have.

  Really?  Tell you what, when I wrote my MD5 functions for Allegro CL 5.0,
  I stuffed it between the I/O system and the reader, and I reset and grab
  the md5 hashes while reading characters from the stream.  If I can do
  this in Allegro CL, so can you in CMUCL.  If you naively implement the
  fairly stupid C model used for MD5, sending blocks of 64 "bytes" down to
  a new function, instead of grabbing the input buffers of the stream, and
  run into problems that you do not look into seriously in order to find a
  better way, you are simply incompentent at what you do and should not
  blame anyone else, _especially_ not the language.

| Using MD5 to portably support code version tagging, etc., becomes
| "interesting".  By no means impossible.  Just a PITA.

  I have no idea what you are trying to talk about.

| No I'm not.  And sometimes, I want to do only one.  Sometimes, I want to
| do only the other.  Sometimes, I am happy with the character processing
| done by my implementation (though it's not portable).  Sometimes, I need
| precise control on the processing that happens (e.g. because I'm
| precisely transcoding stuff from one protocol to another).  And
| sometimes, I want to do both byte-processing and text-processing at once
| on the same stream (at once: switching from one to the other, or even
| doing both character processing AND md5sum'ing on the same chunk).

  I honestly fail to see the problem.  In my view, it takes a fairly dense
  programmer to fail to deal with these things intelligently.  If you need
  both byte stream and character stream, as you would do in HTTP, there are
  two ways of doing that: so-called "bivalent" streams, from which you can
  read both bytes and characters, but which introduces serious problems in
  maintaining state information about non-trivial character codings, or
  some means to switch the type of the stream between byte and character,
  which communicates to the lower levels what you intend to do from now on.

  If you need precise control, ask for it.  Your system does _not_ do a lot
  of weird magic that you have no control over.  Just trust me on this, OK?
  Go read the fine documentation and discover for yourself that you _can_
  trust the implementation.

| In the latter cases, any implicit character processing done by the
| implementation is an abstraction inversion, to me.

  Yes, to you, because you have _already_ inverted the model.  You do _not_
  convert from bytes to characters to bytes in order to do MD5 hashing on
  the bytes -- the danger of losing the original byte values is too high no
  matter how you do things.  You have to get at the bytes where they are
  actually found, just after they have been read, and just before they are
  written.  Anything else is pretty damn stupid.

| >   if you are dealing with text, you deal with characters, not their
| >   coding,
| Not even. I could be dealing with words, with sentences, with layout
| elements. In such contexts, characters are too low-level and not what I
| like.

  I am sure you think this is relevant to something.  Could you try to make
  it a bit more clear what it is might be relevant to?  No, never mind.

| Yet I don't resent of CL not standardizing on high-level protocols --
| it's things that can easily be implemented on top of them.  However,
| wrongly standardizing on low-level things while adding restrictions to
| them is an abstraction inversion and it's wrong - it's the language
| getting in the way rather than helping.

  Have you at all considered that _you_ might be wrong in any of this?  If
  not, I would actually like to hear your arguments for why your way of
  seeing things is correct.  I am getting tired of your non sequiturs and
  the randomness of your "conclusions".

| I'm sorry I don't speak norvegian (yet).  It means precisely what it says.

  Oh, geez, you really _are_ a typically French retard.  Well, thank you
  for proving that my guess that your whining is due to your incredible
  incompetence and the arrogance that only rabid ignorants who have no
  desire to listen to anyone other than the voices in their head.  You
  really made it clear that you lack the ability to express yourself in
  English, but when you resort to the kind of low-level idiocy you do, all
  hope is lost that you will ever recover enough of your thinking ability
  to get back on track.

  Of _course_ Common Lisp does not match your wishes, Francois-Rene Rideau.
  If it did, it would be a really stupidly designed language.  I am happy
  you were not around when things _were_ designed.

  Incompetence on your scale should be punishable by law.

  If you think you had some valid concerns about the language, back up and
  do a better job of presenting them, unoccluded by your stupidity and
  incompetence and your "wishes".  I am quite sure you will find people
  sympathetic to a number of language design issues when it comes to coding
  stuff like MD5, but a good engineer knows when to use languages that are
  better for some particular tasks than others.  Bad engineers should just
  be encouraged to become better at what they do, or to switch careers to
  one in which they _could_ become good at what they do.

///
-- 
  Norway is now run by a priest from the fundamentalist Christian People's
  Party, the fifth largest party representing one eighth of the electorate.
-- 
  Carrying a Swiss Army pocket knife in Oslo, Norway, is a criminal offense.