Detecting Modified Files With CRCs

A collection of information about Vault, including solutions to common problems.

Moderator: SourceGear

Post Reply
sterwill
Posts: 256
Joined: Thu Nov 06, 2003 10:01 am
Location: SourceGear

Detecting Modified Files With CRCs

Post by sterwill » Mon Sep 27, 2004 9:09 am

Vault 3.0 introduces a client-side feature that uses a Cyclic Redundancy Check (CRC), a type of one-way hash function, to detect modifications to files in working folders. Because each byte of the potentially modified file is read to compute the CRC value, which is then compared against the CRC value of the unmodified file, this check is much more accurate than comparing filesystem modification times. It is also much slower, because file contents must be read for every check.

Vault uses CRC-32 (also used by Ethernet, the ZIP and PNG file formats, and others). While CRC-32 is not a cryptographically strong hash function, it is sufficient for detecting random changes to working folder files, and is fast and easy for Vault to compute. It is also small (4 bytes), and easy to store in the repository database for all versions of all files.

What do CRCs do for me?
Vault has historically only used filesystem modification times to determine if a file has been modified. This testing method resulted in many false positives (files that are not actually modified, but have newer timestamps) and a few false negatives (files that were really modified, but have had their original modification times set back).

CRCs completely eliminate false positive detections (two files that are identical will always have the same CRC value). They also drastically reduce false negative detections, since two different files have a 1 in 4,294,967,295 chance of computing to the same CRC value. When CRCs are enabled in Vault, the size of a working folder file is always compared to the stored size of its unmodified version. If the sizes differ, the CRC check is skipped to save time (because the file is definitely modified).

When would I not want to use CRCs?
CRCs can be slow to calculate for very large files, or for working folders stored on slow storage media (network shares, USB drives, etc.). If most of your working folders are stored on slow media, you may not want to enable CRCs. If you have large files in your working folders that are slowing down the CRC tests, you can adjust the size threshold in the Options dialog. See the Other CRC Options section below.

How do I enable CRC tests?
CRCs are disabled by default for all users in the Vault client. To enable them, open the Options dialog (under the Tools menu) in the Windows Forms client. Select the Local Files section from the list on the left side of the dialog, and check the check box titled "Detect modified files using CRCs instead of modification times." When you click OK to close the dialog, the feature is now enabled.

Other CRC Options
Computing CRC values for large files can take a long time, because each byte of the file must be read. Vault can skip computing CRCs for files over a certain size; the modification times will be compared for these files instead. The default threshold is 10 MB. This can be adjusted in the Options dialog of the Windows Forms client (the option is titled "Only use CRC checks for files smaller than...").

This threshold can also be disabled (all files will be checked using CRCs) by unchecking the check box.
Shaw Terwilliger
SourceGear LLC
`echo sterwill5sourcegear6com | tr 56 @.`

Post Reply