Subject: Re: fuzzy string matcher in Lisp
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 1999/01/30
Newsgroups: comp.lang.lisp
Message-ID: <78unuc$r18q@fido.engr.sgi.com>
Dieter Menszner <menszner@t-online.de> wrote:
+---------------
| I am looking for a Lisp function which does a fuzzy string match
...
| Does anybody know of some downloadable source code ?
+---------------

It's in C, not Lisp, but looking at it might get you some algorithms
you could use:

	ftp://ftp.cs.arizona.edu/agrep/README
	ftp://ftp.cs.arizona.edu/agrep/agrep-2.04.tar.Z

From the README:

	...
	The three most significant features of agrep that are not supported
	by the grep family are 
	1) the ability to search for approximate patterns;
	    for example, "agrep -2 homogenos foo" will find homogeneous as well 
	    as any other word that can be obtained from homogenos with at most 
	    2 substitutions, insertions, or deletions.
	    "agrep -B homogenos foo" will generate a message of the form
	    best match has 2 errors, there are 5 matches, output them? (y/n)
	...
	The tar file contains the source code (in C), man pages (agrep.1),
	and two additional files, agrep.algorithms and agrep.chronicle,
	giving more information.
	The agrep directory also includes two postscript files: 
	agrep.ps.1 is a technical report from June 1991 
	describing the design and implementation of agrep;
	agrep.ps.2 is a copy of the paper as appeared in the 1992
	Winter USENIX conference.

I have a script "word", which just runs "agrep" on "/usr/lib/dict/words",
that I find *very* handy when I can't remember how to spell something.
To use the example word given above:

	% agrep -2 homogenos /usr/lib/dict/words
	homogenate
	homogeneity
	homogeneous
	inhomogeneity
	inhomogeneous
	% 

Hope that helps,


-Rob

-----
Rob Warnock, 8L-855		rpw3@sgi.com
Applied Networking		http://reality.sgi.com/rpw3/
Silicon Graphics, Inc.		Phone: 650-933-1673
2011 N. Shoreline Blvd.		FAX: 650-964-0811
Mountain View, CA  94043	PP-ASEL-IA