Introduction to Subversion

by Tony Kay

Subversion is a relatively new version control system which is meant to be an improvement on CVS. Both of these tools are intended to allow multiple developers to check out working copies of source code so that concurrent development can occur on different sections of an application (or document).

I have been using Subversion for over a year now, and have found it to be useful in any situation where I want to keep track of changes to files. Specifically, I use it in the following situations:

Subversion or CVS?

Subversion is a new project, and CVS has been around a while and is widely used. So what are the reasons to use Subversion?

Alternatives to CVS and Subversion

The world is never one-size-fits-all. There are many ways to keep track of changes to files. One way is to do daily backups and keep them for a couple of years. I would argue that this particular choice is inferior in most respects: almost no one is willing to bother with backups every day, and finding the right backup that contains the file you want can be a real chore.

Subversion is meant to be a general purpose tool. It keeps track of files and changes to those files. If you are looking for it to do more, then it might be a good idea to check for tools that are more specialized for your needs. A popular choice for web developers is to use a content management system like OpenCMS (www.opencms.org), which not only version your files, but are tuned for web page development. The disadvantage of specialized tools is that they may be less adept at version control, may be more resource intensive, and may be more complex to install and configure.

So How Does It Work?

The basic idea is that there is a central repository that keeps track of a group of files, including every change that is made to them. Individual developers check out a copy of these files, make changes to their private copy, and when they have something useful to submit they commit their changes back to the central store. Changes made by one developer cannot be seen by anyone else until they commit them.

Committed changes can be pulled into someone's working copy at any time. This allows a developer to merge the changes made by others into their own working copy so they can track the progress of the project as a whole.

Of course, there is nothing that says you have to use Subversion with more than one developer. I use it all the time for things that are unique to my environment.

For example, I store most of my UNIX home directory in Subversion. This allows me to keep tabs on all of my important files, and even allows me to share all or part of my home directory among my machines, such as my desktop and laptop. When I make a change to a file, I commit the changes to the central repository and it is then available for checkout on my other machines.

Storing these files in a versioned system also gives me the ability to recover files deleted long ago, undo changes to configurations that have proven to be unworkable, or restore files that I've accidentally erased. It also has the advantage that when I make a backup of my Subversion repositories, I am also backing up a complete history of my important files. Finally, the fact that I keep a working copy checked out on multiple machines means that I am well-protected from data loss due to catastrophe or theft. I could lose all of my backup CD's and my repository machine in a fire, but if one of the machines that has a working copy survives, then I at least have a pretty recent version of my files in tact.

On the down side, using Subversion will increase your disk space usage by quite a bit. This is usually not a concern on personal machines, where people usually have space to spare, but it may be a concern if you have a disk quota on a multi-user machine. The repository itself will be at least as big as your initial set of files, and will grow; each working copy, which includes the files you are to work on as well as a hidden copy of those files in an unmodified state, usually takes more than three times the disk space as an unmanaged set of files.

As an example, my open-source projects directory contains about 3MB of source files when exported from subversion (i.e. as unmanaged files). The working copy for these same files takes 11MB. The repository that keeps track of them is currently at revision number 196, and takes 6MB.

The small size for the repository may seem a bit odd at first, especially since it has the complete history of 196 different versions of my files! This paradox is resolved by the fact that the repository only has to track the differences in the files from one version to the next. For example, in revision 192 I may have changed only one line of one file. That one line and the context for it (i.e. file and location) are all that has to be stored in order to move from version 192 to 193.

Space usage can be mitigated somewhat if you run the repository on a personal machine, and just keep a working copy on the multi-user machine. This is in fact what I do with my University computing account, where a good portion of my home directory is really a working copy that has been checked out of a Subversion repository that is running on my own networked Linux box.

Installing Subversion

This step can be very easy or very difficult depending on your target OS and personal level of control over the hardware. Subversion can be built as a user of any of the supported operating systems, but the easiest way to install it is to use pre-compiled packages, which require that you have unlimited access to the configuration of your target system. If you are trying to build it yourself, then be prepared to spend some time getting it all correct.

The simplest platforms on which to use Subversion are the ones that have binary distributions: Linux, Mac OS X, and Windows. A GUI client called RapidSVN is also available. It uses the wxWindows toolkit to give it some platform independence, and there are binary versions for Windows and a few variants of Linux.

If you are using Windows, you can also download a GUI system called TortoiseSVN which integrates with Windows Explorer to give you point-and-click access to your file management functions.

Setting up a repository

The first thing to do in order to use Subversion is to create a repository. This is nothing more that a directory that stores the subversion database for a set of files. You can have as many repositories as you want, and I suggest making different repositories for files that need different levels of security. For example, I do not want to share my UNIX home directory with the world, so I put that in a repository that has very strong access restrictions (SSL and authentication required). I also work on projects that I make freely available on the Internet. I make those repositories read-only, and require authentication for committing changes.

The creation steps are the same for either kind of repository. The physical separation just gives you a way to easily break up your security policies later. The command to create a repository is:

   svnadmin create name

Where name is the name of the directory into which your new repository should be created. The directory should not already exist, but the path to it should. Once it is created, you should change the ownership and permissions on the directory to appropriate settings. For example, if the repository will only be used through Apache, then the user that runs Apache should be the owner of the repository, and that should also be the only user that can read/write the repository files.

If you plan to use Subversion from local disks only, then you may have some trouble if you want more than one person to write files to the repository. The problem is that new files are created from time-to-time, and the person who is using Subversion at the time ends up being the owner.

If you are using a binary distribution of Subversion, then you should have gotten a pre-compiled version of Apache and the modules needed to run a networked subversion repository. The security of a network repository is completely controlled through Apache, and the instructions for setting up simple network access can be found in the Subversion Book available from http://subversion.tigris.org.

I find that the networked method of access is better in the long run for almost all uses, because it avoids permission, ownership, and process interruption issues that can cause problems with direct disk access.

Nevertheless, some users may need to use local disk instead of networked access. I have two warnings for those users:

  1. Do not try to interrupt Subversion commands. Killing the commands can leave locks in bad states. I have never lost data because of this, but I have had to go into the repository database directory and run db_recover to fix stale locks, or svn cleanup to do the same for a working copy.
  2. You will have problems if you want to work with multiple read-write users. The transaction logs are created with owner-only permissions, so even being in the same group doesn't help. There may be a workaround, but I am not aware of it. Note that if you are the only one who will write, then there is no problem.

Repository Layout and Addressing

Locations in your new repository are accessed via Internet URL syntax. If you are using local disk access, then you point to files with a file URL:

file://path_to_repository/path_to_file

If you are using network access:

http://alias/path_to_file

where alias is the path you tell Apache to use for accessing the repository.

Subversion understand other URL types, including HTTPS, and a special one that lets you access a disk-based repository through secure shell (svn+ssh://host/path).

Examples

The following example assumes the following:

The subversion commands will work at the command prompt on Linux/OS X/Windows if you substitute proper path/file naming. The other commands are UNIX specific, and will only work on Linux and OS X. If you plan to use a GUI client, you will still need to do the initial repository setup from a command line.

  1. Make sure /home/nancy/svn exists, is owned by nancy, and is readable and writable by nancy, but no one else. This is platform specific.
       mkdir -p /home/nancy/svn
       chmod 700 /home/nancy/svn
    
  2. Create the repository:
       cd /home/nancy/svn
       svnadmin create docs
    
  3. Move the existing documents directory to somewhere else:
       mv /home/nancy/docs /home/nancy/olddocs
    
  4. Import the files:
       svn import /home/nancy/olddocs file:///home/nancy/svn/docs
    

    This copies the documents into the repository. Note that your olddocs directory is never part of the subversion system. It is a backup of your documents before they were under the management of Subversion.

  5. Check out a copy of the versioned files:
       cd /home/nancy
       svn co file:///home/nancy/svn/docs docs
    

You should see a list of your files as they are placed into the (new) docs directory. At this point you are ready to edit the files.

Be aware that you should no longer manage the files with regular file system commands (i.e. do not delete, copy, or rename them with the operating system once they are in Subversion). Subversion needs to know when such a change occurs so it can track it, so it has its own commands for doing these things.

How To Manage Files with Subversion

Final Comments

Subversion is a great tool for keeping a history of the changes to a set of files. It provides a useful extension of the classic backup schemes, and helps you share files among multiple machines on a network.

References


Subversion Home Page: http://subversion.tigris.org
Apache Web Server Installation and Configuration: http://www.apache.org