Subject: Re: MD5 in LISP and abstraction inversions
From: Erik Naggum <erik@naggum.net>
Date: Thu, 01 Nov 2001 13:27:56 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3213610074551907@naggum.net>

* Lieven Marchand <mal@wyrd.be>
| You've done the work on md5 and I haven't so this is guess work, but my
| first thought where the problem would lie is that a lot of these
| algorithms assume efficient 32 bit operations, which in CL will
| cons. Given sufficient declarations the character->(unsigned-byte 8)
| conversions with char-code should be negligable.

  MD5 seems to be designed with hardware implementations in mind.  This is
  not a bad idea, but it tends to make software implementations weirder.

  I have written an MD5 function in Allegro CL, using several low-level
  features (which are still at a higher level than C) and decided to split
  the 32-bit numbers in two 16-bit parts.  Emacs has done the same thing,
  but in a slightly different way.  I wanted to add support for a bitwise
  rotate function that would end up using such instructions if available,
  but it is probably easier to write MD5 in assembly on each platform than
  to optimize it better.  In the application I used this, MD5 hashes were a
  serious bottleneck, so it had to be better than using FFI to a C function.

  After having worked with the MD5 functions on and off for a month or so,
  I came to conclude that it _should_ be written in assembly, and started
  to do that, but it turned out to be extremely time-consuming work.  The
  Intel processors are shy too many registers, so all the fun in trying to
  make it superfast was replaced by increasing frustration over the design
  of those processors.  Still, initial estimates indicated that a hand-
  tuned assembly version would be about twice as fast as the code that gcc
  produced for the naive implementation found in the RFC, so it would be
  worth it at significant cost.

  I also think the new streams design from Franz Inc makes for a good way
  to deal with the 64-byte buffers.  It is wrong to try to work with MD5 at
  the character level, and the typical use of MD5 is to ensure that some
  data stream is intact.  If the application that consumes the stream can
  do both MD5 hashing and its real work at the same time, and can roll back
  the work if the MD5 hash turns out bad, much will be saved compared to
  making two passes, since we must assume that the MD5 hash is usually good.

///
-- 
  Norway is now run by a priest from the fundamentalist Christian People's
  Party, the fifth largest party representing one eighth of the electorate.
-- 
  Carrying a Swiss Army pocket knife in Oslo, Norway, is a criminal offense.