Subject: Re: To diff or not to diff
From: (Rob Warnock)
Date: Mon, 16 Aug 2004 06:06:22 -0500
Newsgroups: comp.lang.lisp
Message-ID: <>
rem642b@Yahoo.Com <RobertMaas@YahooGroups.Com> wrote:
| > From: Gareth McCaughan <>
| > Disc [sic] space is cheap but network bandwidth isn't. You're going
| > to need diffs somewhere in the system.
| That might be misleading. You don't actually need to maintain any
| static diffs as files anywhere. It's sufficient to do a remote-compare
| as needed to reconcile versions that have drifted from each other
| and/or to verify that two alleged identical base versions of a file
| actually are identical. I conceived the algorithm way back when the
| fastest modem in ordinary use was 1200 baud (but I had only 300 baud at
| the time) and the fastest net backbone was 56k bps. I never found
| anybody interested in my algorithm at the time...  The basic idea
| is to do a checksum of the whole file, so if they agree then
| presumably the files are indeed identical, but do it in such a way that
| checksums of the pieces are available already, so pieces can be
| compared to see which pieces match and which don't..

The "rsync" program [a standard utility on Linux and xxxBSD and most
other Unixes, see <>]
does exactly this, although I believe its unit of granularity is
from the first change through te end of the file. Still, it does
a *very* nice job on files which are mostly appended to (such as
programs during development, mailboxes, and archives of saved mail
or netnews), as well as handling large tree of same.


Rob Warnock			<>
627 26th Avenue			<URL:>
San Mateo, CA 94403		(650)572-2607