Performance Impact of Number of Files

If you are having a problem using Fortress, post a message here.

Moderator: SourceGear

Post Reply
AjarnMark
Posts: 60
Joined: Mon Oct 29, 2007 4:22 pm
Location: Seattle, WA

Performance Impact of Number of Files

Post by AjarnMark » Tue Mar 10, 2009 9:35 am

In this other thread Jeremy made a comment about areas of performance that are impacted by the number of files. Since this is really a different subject than the main point over there, I thought I'd start a new thread to ask follow-up questions.
  • Are there recommended guidelines regarding repository size or folder size in terms of number of files, database size or other metrics?
  • Do branches count differently (if I understand correctly, a branch is a pointer, and check-ins are DIFFs, neither is a full copy of a file)?
  • What areas are impacted greatest? It makes sense that a recursive GET is impacted by number of files, but what else
Thanks!

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Re: Performance Impact of Number of Files

Post by lbauer » Tue Mar 10, 2009 3:58 pm

We don't have specific guidelines because so much depends on the size of the team, hardware/network/usage.

Generally, smaller repositories are faster than very large repositories. A folder tree with a few folders will be faster than a tree with hundreds of folders. A 5 GB database is easier to manage, backup, etc. than a 50 GB database.

But the fact is, guidelines can't dictate the size of your tree/repository/database. Ultimately, the size is determined by the needs of your project(s) and team.
Sometimes you don't have a choice but to have 1000 files in a folder.

But if you do have a choice, split projects up into different repositories, branch only when necessary, put fewer items in folders, and avoid storing very huge files in Vault. Unless you need to.

BTW: Branches are "lightweight" but still add complexity to the repository.

When users report performance issues with their Vault installation, it's generally due to slowdowns in retrieving the repository tree information, checking out files, etc. These operations cause the Vault server to synchronize with Client information about the tree and checkout lists, etc. and that can take time. Gets are not so much of an issue, except perhaps the first time files are retrieved to the local machine.
Linda Bauer
SourceGear
Technical Support Manager

Post Reply