Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Pre-Patched Kickstart Installs

by Q Ethan McCallum
02/17/2005

Editor's note: Ethan has collected this series and other information into Managing RPM-Based Systems with Kickstart and Yum.

My two previous articles explained how to use Kickstart to automate OS installs and upgrades. This article demonstrates some techniques for the third piece of the system maintenance cycle: keeping your machines up to date. That includes how to:

What's interesting is that the last two techniques are half technology, half architecture: a naming convention and a few symbolic links go a long way. Some custom code doesn't hurt, either.

I tested the steps outlined in this article under Fedora Core versions 2 and 3, but they should also work under Red Hat 9 and FC1. This article assumes you have a modest familiarity with RPM, Kickstart, and yum. Refer to the Resources section for links to documentation and other articles on these topics.

Setting Up Your Own yum Repository

Red Hat 9 and the Fedora Core series include yum (Yellow Dog Updater, Modified) to simplify system updates. Point yum to a collection of RPMs (a repository, or repo for short) and it will find the latest packages to install on your system. Fedora's default yum install includes public repo definitions, so you can keep your system up to date by running yum's cron jobs.

Running your own internal repo saves bandwidth if you have multiple machines, because only one machine fetches new RPMs from the outside world. This makes internal updates faster, because few Internet hookups can match LAN speeds. You can also fold your own RPMs into the repo and manage all software updates from the same centralized resource.

Most of all, pointing machines to a private repo gives you control: you can limit what yum sees and, in turn, what it installs. yum downloads a repo's newest RPMs, which aren't always the best for you. For example, an new version of a shared library may require you to recompile some homegrown code. That makes for an ugly surprise.

A yum repo is just a collection of RPMs and some metadata extracted from them. yum clients use the metadata to determine what RPMs are in the repo. Setting up a repo, then, requires:

To setup your wget job, first select a download site from the Fedora team's list of update mirrors.

Next, wrap your wget command in a shell script, Perl tool, or whatever else suits your fancy. I use the following wget switches:

You can call your wget script manually or via cron. Please show courtesy to your download site's maintainer and schedule jobs for off-hours, and set the --wait flag to 60 or 120 seconds (1 or 2 minutes) or more between downloads. If the job runs overnight, the extra download time won't make a difference.

Setting up the web server is even easier: point the document root to the directory where you downloaded the RPM updates. For flexibility and growth, you may want to standardize on a directory structure, such as that shown in Figure 1.

sample repo directory structure
Figure 1. Sample directory structure for a yum repo, hosted on a Kickstart server

This directory structure accommodates several OS releases and architectures. In this example, the updated RPMs for Fedora Core 2, i386 architecture go under the web server's document root in FC2-i386/updates/Fedora/RPMS.

Run yum-arch to extract RPM metadata if the repo is for FC2 or older. Using the directory structure from above:

$ yum-arch {document root}/FC2-i386/updates

This command scans the RPMs in the tree and dumps header information into the FC2-i386/updates/headers directory. There is one .hdr file for each RPM in the tree.

FC3 stores its header info in a different format, generated by the createrepo command:

$ createrepo {document root}/FC3-ia64/updates

createrepo stores the RPM metadata in a set of XML files. In the above example, these files exist under the web server document root in FC3-ia64/updates/repodata.

yum-arch still exists under FC3, so you can create the older header format for FC2 clients. It may be possible to run createrepo under older Red Hat releases in order to serve FC3 clients. Because both tools are written in Python, they might work under other operating systems. Admittedly, I haven't tried this.

Configuring yum Clients to Use the New Repo

Configuring a client is as simple as editing a few text files.

For FC2 and earlier, the repo definitions live in /etc/yum.conf. You don't want the client machines downloading from the public repos anymore, so comment out those preexisting definitions with # characters.

Next, define an entry for your shiny new local repo:

[internal-updates]
name = internal update server
baseurl = http://{update-server}/FC$releasever-$basearch/updates

This repo definition breaks down as follows:

FC3 separates repo definitions from the main yum.conf. To disable the existing repos, add:

enabled=0

to all of the .repo files in /etc/yum.repos.d. Create your own internal-updates.repo file that contains just a stanza, similar to the example FC2 entry. Next, test the repo configuration in a nondestructive manner:

# yum check-update

This will contact the repo web server, fetch RPM header info (either from headers or repodata, depending on the target machine's OS version), and list the RPMs for which updates are available. If you're satisfied with those results, tell yum to update the machine based on the repo's contents:

# yum update

You certainly don't have to call yum by hand on all of your machines every time you want to update them. Enable the yum daemon to take advantage of automatic (cron'd) updates:

    (set the daemon to start on every system boot)
# chkconfig --add yum
# chkconfig yum on

    (start the daemon now, so you don't have to reboot)
# service yum start

There's a trade-off between the risk of unattended, automated updates and the cost of manual labor. Manual updates tend to win out in more formal shops. Later in the article, I'll demonstrate a method that provides a layer of change control while allowing machines to update themselves.

Merging Updates with the Kickstart Tree

As long as you've downloaded the updated RPMs, you may as well fold them into the Kickstart process. In turn, newly Kickstarted machines will start their life with the updates already applied. To do this, you must put the latest RPMs under the Kickstart tree's Fedora/RPMS directory, copy the Fedora/base directory (from the original OS install media) to the Kickstart tree, and rebuild the hdlist files. As before, formalizing a directory structure and naming convention will help your repo scale.

Related Reading

Learning Red Hat Enterprise Linux & Fedora
By Bill McCarty

The first step is the toughest, because you can't simply download the RPM updates right into the Kickstart tree's os/Fedora/RPMS directory. Whereas yum downloads the latest RPMs, Anaconda (ergo Kickstart) doesn't gracefully handle situations in which there are multiple versions of a package in the install tree.

You must therefore replace old package versions in the install tree with their newer counterparts. Doing this by hand is for neither the faint of heart nor the lazy. Being a proud member of the latter category, I prefer to let code do the heavy lifting. The key is to use the RPM API to extract package header info and compare versions. Doing this based on RPM filenames alone is asking for a headache. (I've written a tool to do just this -- Novi.)

First, create a new directory to serve as the install tree. Using the directory structure outlined above, that is FC3-i386/dist under the document root. Put the latest packages under that directory, in Fedora/RPMS. (You can save space by hard-linking the RPMs from the install and update trees, if possible.) Copy the directory Fedora/base from the original OS media, too. You should have a directory structure similar to that of Figure 2.

pre-updated
Kickstart directories
Figure 2. Directory structure for a pre-updated Kickstart tree

Notice the dist directory has the same layout as the os directory, which holds the original install media. dist is essentially the os tree with newer RPMs.

Next, use genhdlist to rebuild the hdlist files. FC3 is a little pickier than FC2 and requires that you first generate a package order file:

$ PYTHONPATH=/usr/lib/anaconda \
  /usr/lib/anaconda-runtime/pkgorder \
  {path to FC2-i386/dist} \
  i386 Fedora > order.txt

$ /usr/lib/anaconda-runtime/genhdlist  \
  --withnumbers  \
  --fileorder order.txt  \
  --productpath Fedora  \
  {path to FC2-i386/dist}

FC2 requires only the genhdlist command, without the --withnumbers and --fileorder flags. Feel free to ignore pkgorder's warnings about ignore package name relation(s).

Point your Kickstart clients to the new install directory and add:

url --url http://{build-server}/FC2-i386/dist

to ks.cfg. You won't have to double back after the install to apply the updates.

Change Control: Having a Single Release Tree

Call yum-arch or createrepo on the Kickstart tree to let it do double duty as a yum repo. This creates a one-stop shop for your machines: whether you're installing the OS or just updating it, your entire shop will run the same set of RPMs.

pre-installed
Kickstart install tree
Figure 3. Pre-updated Kickstart install tree that doubles as a yum repo

Notice this is the same as the dist directory mentioned earlier, just with the headers (or repodata) directory for the RPM metadata.

This setup is far from perfect, though. In a large shop, you probably want to test new RPMs on a few machines before you install them everywhere. Then again, you want to take advantage of cron jobs to let yum do its work unattended.

I've spent my career near or in software teams, and as a result I tend to think in terms of release versions: What do we consider stable and production-quality? versus What are we still testing in the lab? and so on. If you had a designated stable build, you'd have no problem letting your production machines update from it automatically. Furthermore, having clear, labeled builds simplifies systems management because you can tell what "label build" a machine is running.

I applied some software development practices and devised a mix of directories and symbolic links to solve this problem. The trick is to create a new, labeled directory each time you fetch new updates from the public repos. I prefer to use dates as my labels in YYYYmmdd format. Figure 4 is an example of such a directory tree.

labeled
directories
Figure 4. Labeled (dated) directories for combination Kickstart/yum trees

Populate the dated directory's Fedora/RPMS subdirectory with the latest RPMs. Copy the base directory from the original OS install media. In the end, the dated directory should resemble the os and dist directories described previously.

Then promote each build through a test cycle:

  1. Point designated scratch machines directly to the dated directories, such as:

    http://{update-server}/FC2-i386/dist-20050105

    The scratch machines' Kickstart and yum configurations will change with every build, to point to the (new) dated directory.

  2. When a label has proven somewhat stable, release it to a wider audience. Designated test machines are integration areas for homegrown and third-party software. Your internal developers and QA staff will primarily use these.

    Create a symbolic link dist-testing that points to the dated build directory (dist-20050105 in this example). In turn, the test machines' Kickstart and yum configurations point to the symbolic link abstraction:

    http://{update-server}/FC2-i386/dist-testing

    The test machines' Kickstart and yum configurations don't change. They are free to update from this repo at leisure.

  3. After the build label has proven itself on the test machines, release it to the production hosts. Create a symbolic link, dist-stable, that points to the build's dated directory. Production hosts point to this designated "stable" URL:

    http://{update-server}/FC2-i386/dist-stable

Promote each build through this cycle to ensure RPM updates' stability and compatibilty with your environment. Of course, you're free to add more steps to the promotion cycle, or split your production machines into A/B groups that update on staggered schedules.

Conclusion

Running your own yum repo saves time and bandwidth, and gives you much more control than pointing your machines to the public repos. Folding this into the Kickstart process brings you closer to a shop that runs itself.

This article barely touched on yum client maintenance, such as the occasional cache cleansing. See Resources section for links to the yum documentation.

Resources

Q Ethan McCallum grew from curious child to curious adult, turning his passion for technology into a career.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.