Developer Question: External Content in Software Repository?
Developer's Cave, Potential RPG June 16th. 2008, 1:37pmHere’s a question for software developers out there (independent game developers and otherwise):
Is it best to keep external content in the software repository?
By external content, I refer to images, generated data files, and other binaries that are required to run the software. By software repository, I mean any system that stores the project files and provides revision control (version control). For the record, I’m using Subversion.
I think the general tendency is to use the repository only for source code, because that is its primary purpose. I felt (note the past tense) that bloating the repository with binaries overstepped the bounds of the repository’s responsibilities.
However, keeping content separate makes it difficult to produce a fully functional application, given some revision (version) of the software. That is, I can dump a version of my game software back to day one, but it will not run because I don’t have the necessary content files. It would not be practical to generate new content for an old version, even to run a small test. I could probably find a project backup containing the necessary content, but that is not a real solution.
As with most things, the best answer is also the simplest:
Place all project files in the software repository.
At least, this is the conclusion that I’ve drawn so far for this project. There are certainly alternatives, such as regularly archiving a full executable version, built from the repository and separate content. I’d really like to know what other developers have experienced on this matter.




June 16th, 2008 at 2:28 pm
As illustrated very nicely by the ClearCase trainer that came here a few years ago, CM systems are basically three-dimensional filesystems, where the third dimension is time. Your content has a direct chronological tie to your implementation, so I see nothing wrong or impure about putting content in the repository.
The only objections would be:
1. Philosophical - from a purist standpoint it doesn’t seem the right place for data. A database seems more appropriate, but ultimately not a good solution. However, a filesystem is, in fact, a database. A 3-D database is an good place for data that changes over time.
2. Storage - Keeping every revision of data could eat up disk space.
I would not consider these strong enough objections to rule out your simple solution: put it all in the repository.
June 17th, 2008 at 10:07 am
Good points. Regarding binary storage space, ClearCase is certainly a beast of a revision control system; I have no idea how it actually stores data. CVS just keeps a copy of each new binary file version.
Subversion uses a binary diff algorithm to more efficiently store just the bytes that have changed. The efficiency is dependent on how the binary changes from one version to the next.
For example, does a small change to a large image allow Subversion to store just a few new bytes, or does the image’s file format cause extensive restructuring of the data, forcing Subversion to store (almost) a whole new copy?
I wonder if there are any statistics on how well this performs…
June 17th, 2008 at 12:48 pm
Interesting. As you have described it there are benefits and trade-offs to each approach. Storing incremental data saves space, but I imagine that reconstructing a file incrementally would become process intensive as the history grew.
It probably stores the most recent version as a whole copy and then each historical reference describes what you change in the current version to get back to that historical version. Access would be slow for historical versions, but good for current versions.
I’m sure looking it up would answer the question definitively, but it is more fun to speculate.
June 17th, 2008 at 1:28 pm
Storing the full latest copy would be good for recoverability. However, Subversion stores its diffs the other way around. According to the (excellent) user manual: