Subversion, as you probably already know, is a version control system written from scratch to replace CVS, the most popular open source version control system. While there are many reasons to choose Subversion, one of the most interesting is that Subversion has been designed and implemented as a collection of reusable libraries, written in C. This allows your programs to use the same functionality found in the command line Subversion client without having to call out to the command line client, to execute commands, or parsing output. This article briefly reviews the Subversion libraries, explains some of their data structures, and demonstrates the use of the Subversion client APIs in other programs.
Before you can jump into the code, you need to install Subversion.
This article was written with
release 0.20.0 of Subversion in mind. It would be best if you had
that version. The installation instructions are available in the
INSTALL file. If you don't like to compile your own
software, you can try a
binary distribution of Subversion .
If you're using an older version of Subversion, it's a good idea to
upgrade at least to 0.20.0. If you're using a more current version,
just watch your step. The Subversion project has not yet released
version 1.0, the APIs are not yet fixed, and things may change. To
get a good idea of the changes between 0.20.0 and the version you
have, look at the CHANGES file in the Subversion tarball,
specifically the "Developer-visible changes" sections. The general
concepts discussed in this article will still apply to any version of
Subversion.
Once you've installed Subversion, you'll need to become familiar with its general use. This article assumes some basic knowledge of how Subversion works. If you've never used it before, take a break and learn how before reading any farther. Some good resources for this are Rafael Garcia-Suarez's great articles on Single User Subversion and Multiuser Subversion.
You may be thinking that using the Subversion libraries directly will add a bit of complexity to your life as a software developer. You'll need to to make your build process find the proper libraries and pass the correct flags to your compiler to link to them, not to mention learning a whole new API--there's a lot to do! I'd be surprised if you haven't at least thought about giving up now and simply writing a little wrapper library that calls the Subversion command line client.
|
Related Reading
Building Embedded Linux Systems |
What do you actually gain from using the libraries directly? Other than the efficiency gained by avoiding the overhead of starting processes for every action, you gain something much more important: correctness. You get access to all the information the client API can give you, rather than the limited subset of that information that the command line client can give you. The command line client is a fantastic program, but it was written to be a good generic command line tool, not to do whatever it is you've got in mind for your application. You'll likely do a better, more complete job by using the Subversion APIs directly.
Like most other software systems, Subversion is built on a number of smaller bits of code. In order to use Subversion's API well, you'll need to understand its underlying libraries.
To ensure maximum portability across a wide number of operating systems, Subversion is built on APR, the Apache Group's portability layer. The APR developers use doxygen markup to comment their code, so you can access their documentation online here. In addition to abstracting away various platform specific bits of functionality Subversion needs, APR provides a set of basic data structures such as hash tables and memory pools. We'll cover memory pools first, as they're probably less familiar.
Before we get into any APR data types, you'll have to learn how to
initialize and shut down APR. Simply call apr_initialize
before calling any APR (or Subversion) functions and use
atexit or some other means to arrange for
apr_terminate to be called at shutdown.
Rather than manually allocating and deallocating memory with
malloc and free, Subversion uses APR's
memory pools to manage memory. Create a pool using
svn_pool_create, which is actually a thin wrapper around
apr_pool_create, with a simpler interface and a few
Subversion debugging tricks. Allocate memory from the pool with
functions like apr_palloc and apr_pcalloc.
You don't need to worry about freeing the memory. When you're done
with everything you allocated out of that pool, just destroy the pool
with svn_pool_destroy (again, a thin wrapper around
apr_pool_destroy). It will free the memory for you.
This is kind of cool, since you only need to worry about freeing
memory once, but it's nothing to write home about. The real benefit
comes when you take advantage of chaining pools together in a
hierarchy. You can create subpools inside your main pool (or inside
other subpools, ad infinitum), and clear them with
svn_pool_clear. This lets you avoid making the operating
system allocate more memory for you, and can give a nice performance
boost in some situations.
Unfortunately, you need to be careful with pools, because you can easily get into situations where a pool is growing without bound as you allocate from it within a loop. To avoid this situation, you have to use common sense. Create a subpool before the loop and clear it each time through the loop. To avoid losing access to things you allocate inside the loop, duplicate them into the parent pool--like this:
char **
function (int iterations, apr_pool_t *pool)
{
/* allocate an iteration pool. */
apr_pool_t *subpool = svn_pool_create (pool);
/* allocate some memory to hold our results. */
char ** array = apr_pcalloc (pool, iterations + 1);
int i;
for (i = 0; i < iterations; ++i)
{
char * result = some_function_that_takes_a_pool (i, subpool);
/* duplicate the result into our main pool. */
array[i] = apr_pstrdup (pool, result);
svn_pool_clear (subpool);
}
/* clean up after ourselves. */
svn_pool_destroy (subpool);
/* return our results, safely allocated in our main pool. */
return array;
}
As long as you're careful with how you use pools, you'll find that they greatly simplify the logic of your code. You can stop worrying about memory management and start concentrating on what your code actually does.
Another common APR data type used in Subversion is the
apr_hash_t. This is just a standard hash table, designed
to work with APR pools. It uses void pointers for its keys and
values, so you can stick whatever you want in it, as long as you're
careful to remember the type so you can cast the contents
appropriately when you retrieve values.
In addition to the various data structures it inherits from APR,
Subversion has a several fundamental data types. The most important
of these is svn_error_t, used everywhere in the
Subversion API.
Rather than returning a generic error code to indicate that a
function has failed, Subversion uses its own "exception" object,
svn_error_t, as the return value for all its functions
that can fail. If a Subversion API function succeeds, it returns the
value SVN_NO_ERROR (which is actually 0, to simplify
error checking). If it fails, it returns a new
svn_error_t. Each svn_error_t contains an
apr_status_t--either the return value of the underlying
APR function that failed or the Subversion specific error code.
All Subversion error codes are defined in
svn_error_codes.h. There is also a const char
* that describes what precisely went wrong, a pointer to
another error (as svn_error_ts can be chained together),
and the pool that the error was allocated from. When Subversion
returns an error, you need to handle it, usually with
svn_error_clear, in order to free the memory associated
with the error and any other errors in its chain. All of the other
error-handling functions are declared and documented in
svn_error.h. Here's an example of a function that
handles an svn_error_t.
void
handle_error (svn_error_t *error)
{
svn_error_t *itr = error;
while (itr)
{
char buffer[256] = { 0 };
printf ("source error code: %d (%s)\n",
error->apr_err,
svn_strerror (itr->apr_err, buffer, sizeof (buffer)));
printf ("error description: %s\n", error->message);
if (itr->child)
printf ("\n");
itr = itr->child;
}
svn_error_clear (error);
}
You might notice that many function calls are wrapped in the
SVN_ERR macro. This is just a quick way of saying that
if the function returns SVN_NO_ERROR, we should continue
on, but if it returns anything else, we return the error to our
calling function, propagating the error up the call stack to be
handled elsewhere. As long as your functions also return
svn_error_t *s, you can use this macro.
|
All right, enough background, on to the actual Subversion client API.
Subversion is broken into several libraries. From the client
developer's point of view, the most important is
libsvn_client, which holds the functions used to
implement the various commands you've seen in the command line client.
This library is a wrapper around the underlying libraries which manage
access to the working copy (your checked out copy of the contents of
the repository) and to the repository (via a number of possible
paths).
All the functions in this library share some common
characteristics. First, they all assume textual data (paths, URLs,
raw text like a log entry or a username, etc.) is UTF-8 encoded and
uses newlines for line endings. This provides consistency between
clients and avoids the need for cumbersome and unnecessary locale and
line ending data tagging. To convert your data into UTF-8, use the
functions in svn_utf.h. For the purposes of this article,
we will assume all input is in ASCII and will avoid conversion to
UTF-8.
Second, each function takes an apr_pool_t pointer to
use for memory allocation.
Third, each function takes a pointer to a structure called
svn_client_ctx_t. This "client context" serves as a
container for several different things that are used across many
libsvn_client functions. For example, all the Subversion
functions that commit a change to the repository require the client to
provide them with a log message. To do this, they use a callback
function and baton that are stored in the client context. Similarly,
many of the functions need to provide progress notification to the
calling application and, eventually, to the user. The library
functions call a notification callback-baton pair that are passed in
via the context. The client context also caches configuration
options, so the libraries don't need to read them in whenever they
require them.
To use the rest of libsvn_client, you will have to
fill in a bare minimum of the context. Here's an example of how to do
it:
svn_error_t *
set_up_client_ctx (svn_client_ctx_t **ctx, apr_pool_t *pool)
{
/* allocate our context, using apr_pcalloc to ensure it is zeroed out. */
*ctx = apr_pcalloc (pool, sizeof (*ctx));
/* read in the client's config options. */
SVN_ERR (svn_config_get_config(&(*ctx)->config, pool));
/* set up an authentication baton. details of how to use libsvn_auth are
* beyond the scope of this article, but for more details on its use you
* can read the code for the subversion command line client and the
* comments in svn_auth.h. */
{
svn_auth_baton_t *auth_baton;
apr_array_header_t *providers
= apr_array_make (pool, 1, sizeof (svn_auth_provider_object_t *));
svn_auth_provider_object_t *username_wc_provider
= apr_pcalloc (pool, sizeof(*username_wc_provider));
svn_wc_get_username_provider
(&(username_wc_provider->vtable),
&(username_wc_provider->provider_baton), pool);
*(svn_auth_provider_object_t **)apr_array_push (providers)
= username_wc_provider;
svn_auth_open (&auth_baton, providers, pool);
(*ctx)->auth_baton = auth_baton;
}
return SVN_NO_ERROR;
}
The comments in svn_client.h provide more details on
the contents of svn_client_ctx_t.
All you need now to start using the Subversion libraries are a few
details on how to compile and link against them. For all the examples
in this article, you will have to link against
libsvn_client-1, libsvn_auth-1, and
libsvn_subr-1. The header files are located in
$(PREFIX)/include/subversion-1, where
$(PREFIX) is either the path you specified for
--prefix when configuring Subversion or
/usr/local by default. You should also include the
output of svn-config --includes --cflags --libs in your
compile and link lines.
Eventually, you should be able to forget about manually including
Subversion's include and libs, allowing svn-config to
take care of the details. But for now you will have to do it yourself.
Here's the Makefile I used when preparing this article,
which should get you far enough along to get things working.
PREFIX=/Users/rooneg/Hacking/article
CC=cc
CFLAGS=`$(PREFIX)/bin/svn-config --cflags` -Wall
INCLUDES=`$(PREFIX)/bin/svn-config --includes` -I$(PREFIX)/include/subversion-1
LIBS=`$(PREFIX)/bin/svn-config --libs` -L$(PREFIX)/lib -lsvn_subr-1 \
-lsvn_auth-1 -lsvn_client-1 -lsvn_wc-1
.c.o:
$(CC) $(CFLAGS) $(INCLUDES) -c $<
basic-client: basic-client.o
$(CC) $(CFLAGS) $(LIBS) basic-client.o -o $@
clean:
rm -rf *.o
rm -rf basic-client
Now you're ready to write an actual application that uses
libsvn_client. Let's say your company makes a web
application. You store everything in a Subversion repository and can
simply check out some parts of your source onto the web server and
things just work. Suppose also that several of your web developers
are unskilled in using version control tools. To simplify life for
them, you're writing an application to let them deploy a site from the
tree to the server, query what versions of each file they have there,
and update to new versions from the repository.
You will need to use at least three functions from
libsvn_client. svn_client_checkout deploys
the site for the first time to a new
server. svn_client_status checks the versions deployed on
a given server. svn_client_update deploys new versions
of the site to an existing install on the server. We'll look at each
function in turn.
As you might guess, svn_client_checkout implements the
svn checkout command. It takes as arguments the URL to
check out from the repository, a path that will become the root of the
new working copy it creates, and a number that indicates which
revision of the URL you want to check out. There's also a boolean
flag that indicates if the checkout should recurse into subdirectories
inside the URL. Besides these specific arguments, the normal
libsvn_client function arguments apply; a client context
and a pool. If you want to provide feedback to your user as the
checkout takes place, you can provide a notification callback and
baton inside the context to be called each time something happens.
Here's an example of how your application could use it:
void
deploy_notification_callback (void *baton,
const char *path,
svn_wc_notify_action_t action,
svn_node_kind_t kind,
const char *mime_type,
svn_wc_notify_state_t content_state,
svn_wc_notify_state_t prop_state,
svn_revnum_t revision)
{
printf ("deploying %s\n", path);
}
void
deploy_new_site (const char *repos_url,
const char *target_path,
svn_client_ctx_t *ctx,
apr_pool_t *pool)
{
svn_opt_revision_t revision = { 0 };
svn_error_t *err;
/* set up our notification callback. our callback doesn't use a baton, so
* we can just leave that blank. */
ctx->notify_func = deploy_notification_callback;
/* grab the most recent version of the website. */
revision.kind = svn_opt_revision_head;
err = svn_client_checkout (repos_url,
target_path,
&revision,
TRUE, /* yes, we want to recurse into the URL */
ctx,
pool);
if (err)
handle_error (err);
else
printf ("deployment succeeded.\n");
}
Now that your application can deploy a new website, it needs to be
able to query the deployed version to find out which versions of each
file are there. This needs svn_client_status, the
routine that implements the core of the svn status
command. svn_client_status is a bit more complicated
than svn_client_checkout, as there are more variations.
If the "descend" argument is TRUE, it recurses down a
path in a working copy, filling in an apr_hash_t with
keys that contain each entry's path and values that are
svn_wc_status_ts. Otherwise, it just reads the entries
in the top level of the directory structure.
To check the status of the entry in the working copy against that
of the repository, you can pass TRUE as the "update" flag. The
svn_wc_status_ts in the hash will have their
repos_text_status and repos_prop_status
members filled in appropriately. This will also fill in the
youngest argument with the number of the most current
revision in the repository.
Use the "get_all" argument to switch between fetching all entries
in the working copy in the hash or only the "interesting" entries
(either locally modified or out of date compared to the repository).
If you don't want the svn:ignore property to control
which entries are seen, pass TRUE for the 'no_ignore'
argument. As with svn_client_checkout, any notification
callback will be called, along with the context's notification baton,
for each entry placed in the hash. The following example uses
svn_client_status to print the revision numbers of
everything in the deployed site.
void
print_revisions (const char *deployed_site,
svn_client_ctx_t *ctx,
apr_pool_t *pool)
{
apr_hash_t *statuses;
svn_error_t *err;
err = svn_client_status (&statuses,
NULL,
deployed_site,
TRUE, /* descend into subdirs */
TRUE, /* get all entries */
FALSE, /* don't hit repos for out of dateness info */
FALSE, /* respect svn:ignore */
ctx,
pool);
if (err)
{
handle_error (err);
return;
}
/* loop over the hash entries and print them out */
{
apr_hash_index_t *hi;
for (hi = apr_hash_first (pool, statuses); hi; hi = apr_hash_next (hi))
{
const svn_wc_status_t *status;
const char *path;
const void *key;
void *value;
apr_hash_this (hi, &key, NULL, &value);
status = value;
path = key;
if (status->entry)
printf ("%s is at revision %" SVN_REVNUM_T_FMT "\n",
path, status->entry->revision);
else
printf ("%s is not under revision control\n", path);
}
}
}
The final feature for your application is the ability to update a
deployed site to a newer version, using svn update. As
you might suspect, the libsvn_client function for this is
svn_client_update. Fortunately, this is much simpler
than svn_client_status. Pass it the path to the deployed
site, an svn_opt_revision_t identifying the version to
which to update, and a flag to allow or disallow recursion into
subdirectories. As usual, it takes pool and context arguments, and
any notification callback in the context will be called for each
updated entry. Let's see how this can be used to update our deployed
website to the latest revision in the repository.
void
update_notification_callback (void *baton,
const char *path,
svn_wc_notify_action_t action,
svn_node_kind_t kind,
const char *mime_type,
svn_wc_notify_state_t content_state,
svn_wc_notify_state_t prop_state,
svn_revnum_t revision)
{
if (action == svn_wc_notify_update_completed)
printf ("Updated %s to revision %" SVN_REVNUM_T_FMT "\n", path, revision);
}
void
update_deployed_site (const char *deployed_site,
svn_client_ctx_t *ctx,
apr_pool_t *pool)
{
svn_opt_revision_t revision = { 0 };
revision.kind = svn_opt_revision_head;
ctx->notify_func = update_notification_callback;
err = svn_client_update (deployed_site, *revision, TRUE, ctx, pool);
if (err)
{
handle_error (err);
return;
}
}
There you have it, a simple set of functions that take the existing
functionality of libsvn_client and apply it to your
specific problem. Due to the design of Subversion, you can do this
much more flexibly than by wrapping up an existing command line
client. My next article will look at how to extend this to provide
the ability to edit the deployed files and to commit the changes back
into the repository, using the rest of libsvn_client.
Garrett Rooney is a software developer at FactSet Research Systems, where he works on real-time market data.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.