Subject: Re: To diff or not to diff
From: rpw3@rpw3.org (Rob Warnock)
Date: Mon, 16 Aug 2004 06:06:22 -0500
Newsgroups: comp.lang.lisp
Message-ID: <rLCdnTcwc6izCL3cRVn-iw@speakeasy.net>
rem642b@Yahoo.Com <RobertMaas@YahooGroups.Com> wrote:
+---------------
| > From: Gareth McCaughan <gareth.mccaughan@pobox.com>
| > Disc [sic] space is cheap but network bandwidth isn't. You're going
| > to need diffs somewhere in the system.
| 
| That might be misleading. You don't actually need to maintain any
| static diffs as files anywhere. It's sufficient to do a remote-compare
| as needed to reconcile versions that have drifted from each other
| and/or to verify that two alleged identical base versions of a file
| actually are identical. I conceived the algorithm way back when the
| fastest modem in ordinary use was 1200 baud (but I had only 300 baud at
| the time) and the fastest net backbone was 56k bps. I never found
| anybody interested in my algorithm at the time...  The basic idea
| is to do a checksum of the whole file, so if they agree then
| presumably the files are indeed identical, but do it in such a way that
| checksums of the pieces are available already, so pieces can be
| compared to see which pieces match and which don't..
+---------------

The "rsync" program [a standard utility on Linux and xxxBSD and most
other Unixes, see <http://samba.anu.edu.au/rsync/features.html>]
does exactly this, although I believe its unit of granularity is
from the first change through te end of the file. Still, it does
a *very* nice job on files which are mostly appended to (such as
programs during development, mailboxes, and archives of saved mail
or netnews), as well as handling large tree of same.


-Rob

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607