macdevcenter.com
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Integrating Xgrid into Cocoa Applications, Part 1
Pages: 1, 2

The Xgrid Application Client

The last part of the puzzle is the Xgrid Clients. There are now two: a Cocoa app unusually christened Xgrid, and the command-line Client, going by the imaginative title of xgrid. We will spend most of our time working with the command-line tool, but let's take a quick look at the GUI to test our installation, and see how things work in practice.



Go to the /Applications directory, and open the Xgrid app. You should be presented with a choice of available Controllers. Probably only your local Controller will be visible. You can either choose to connect to the local Controller, or you can click the button Start Local Service, which can be used for testing purposes.

Having connected to a Controller, you should see the New Job window. Here you can see the available clusters. Double click the cluster called "Rendezvous." You should see which Agents are connected to the Controller. You will probably see only your own computer here, and hopefully it has the status 'Available'.

Returning to the New Job window, you can see a number of job types represented in the column on the right. Each of these is an Xgrid plugin, which can be made to submit different types of jobs. You can write and install these plugins yourself by following the documentation described in /Library/Xgrid/Developer.

We are going to quickly run the Mandelbrot plugin. Double click on it. If the calculation doesn't start, click the Start button. This plugin calculates the Mandelbrot set, which you should see begin to appear. You should also see a Tachometer window, which indicates how many MHz are being applied to the job across the whole Xgrid. If you are just using one computer, the reading should be about the same as your CPU speed. If you install Xgrid on another computer, and connect both Agents to the Controller, the tachometer should give a value equal to the sum of the available CPU speeds.

Mandelbrot Xgrid Plugin Screenshot.
The Mandelbrot Xgrid plugin, with Tachometer

The Command-Line Client

We are now going to take a detailed look at the command-line Client, because it will teach us more about how Xgrid works, and also because it is currently the best way to make use of Xgrid from a standalone Cocoa application.

The command-line Client is accessed through the xgrid command. The best place to read about it is in the man page. Simply open a terminal window, and type

man xgrid

In submitting any job to Xgrid, the Client has to provide a number of things to the Controller. The first is the executable to be run. More often than not, this will be a shell script, but it could also be a Cocoa tool, for example. Along with the executable, the Client has the option of supplying a standard input file and/or a directory of files. This directory will be copied by the Controller to the Agent running the job, and becomes the working directory on the Agent machine (that is, the directory where the executable will run). You can supply anything in the working directory, from input files to libraries and executables.

After a job completes, the standard output and error streams are returned to the Controller. The Client can choose to have these streams piped to files, or they can simply be directed to the output/error streams of the shell that submitted the job. Any other files created in the working directory on the Agent can also be retrieved, if requested by the Client. These are returned to an output directory indicated when the job is submitted.

Example: Distributed Builds with Xgrid

Let's see a concrete example. We are going to write a simple script that will perform distributed builds, like Xcode. A nice thing about using Xgrid for this is that it is not necessary to install our application — in this case the gcc compiler — on each computer in the cluster. Instead, if we choose, we can send our executables along with each job. This is an important distinction between grid applications, and other distributed applications like Xcode: the latter need to be preinstalled on every system.

You can download the script described here, along with some Objective-C code to practice on, by clicking here. You can also practice with your own source code if you like, of course. The source code provided here is taken from an open source Cocoa plotting framework I developed called 'Narrative'. I use this framework in the Cocoa application Trade Strategist, which is Technical Analysis software for the stock market. You can download the full source of Narrative here.

The first lines of the script look like this:


#!/bin/sh

#-------------
# Set filenames variable to be everything on the
# command line.
#-------------
filenames=$*

The shebang indicates that this script will be run in a Bourne shell (which is actually bash on Mac OS X). The variable filenames is set on the next executable line. In a Bourne shell, the $* variable represents anything that appears on the command line after the script name. The script thus expects that the names of the files to be compiled will be given on the command line.

The next lines set the locations of a number of temporary directories, which will be used to prepare our jobs and retrieve output.


#-------------
# Set path to output directory, and the submit directory.
# Submit directory is the current directory.
# Output directory is in /tmp. Create it here.
#-------------
outputdir=/tmp/xgridcc_outdir_$$
mkdir -p $outputdir
submitdir=`pwd`

The variable outputdir holds the path to the directory where we want xgrid to return the output files. We put this in /tmp so as to minimize the risk of damaging useful data. Another Bourne shell variable is used in forming the directory name: $$ is the value of the PID number of the running process. Using this helps avoid naming conflicts with other processes. After the output directory path has been set, the directory is created with mkdir. The next line sets the submitdir variable to the path of the current directory. The pwd command writes the path of the current directory, and the back quotes indicate that the output of the command should be inserted in place of the command itself.

Now we begin a loop over the files to be compiled:


#-------------
# Loop over files. Create one job per file.
#-------------
for filename in $filenames
do

  #-------------
  # Change to submission directory
  #-------------
  cd "$submitdir"

  #-------------
  # Setup variables for input directory, and
  # job file.
  #-------------
  inputdir=/tmp/xgridcc_inpdir_$$_$filename
  xgridscript=$inputdir/run
  mkdir -p $inputdir

At the beginning of the loop, we make sure that we're in the directory where the job was originally submitted. Then we set some more variables; for the input directory, which becomes the working directory on the Agent, and for the path to the script that will be passed to the xgrid command. Each iteration of the loop will submit a single job to the Xgrid, with a script to compile a single file. Each submission needs its own input directory, so that is why it's created here, rather than outside the loop.

Now that we have an input directory, we need to populate it with the files needed to perform a compilation.


  #-------------
  # Copy gcc to input directory.
  #-------------
  cp /usr/bin/gcc $inputdir/gcc

  #-------------
  # Copy source file to the input directory
  # Copy all headers to input directory
  #-------------
  cp $filename $inputdir
  cp *.h $inputdir

We could install gcc on every computer, and make use of the installed copy in our job, but to demonstrate more a typical use, we supply our own version of gcc in the input directory. The file to be compiled ($filename), and all header files, are also copied to the input directory. In a more sophisticated build system, we would copy only the headers needed to compile the file, but we will sacrifice efficiency for simplicity here.

You will recall that above we set a variable for the path to a script file that will be the executable run by Xgrid on the Agent machine. We have a path, but still need to create the file itself. Here is how:


  #-------------
  # Create the script that will be submitted for this file
  # to xgrid.
  # Make it executable.
  #-------------
  cat <<eor > $xgridscript
#!/bin/sh
./gcc -c -O5 $filename
eor

  chmod +x $xgridscript

We use cat to concatenate two lines to the output stream, which is piped to the path we created earlier ($xgridscript). The special operator << indicates that the input stream should be read from the lines that follow. Using <<eor, the input stream is read from the script file up until the next line that contains eor. (Note that there is nothing special about the string 'eor'; you can use any string you like.)

The script that we create is simplicity incarnate. It just compiles our file with the gcc compiler in the working directory. Remember that this script will be run on the Agent machine, in the working directory. The working directory contains the version of gcc that we copied earlier, and we want to use this copy, not the copy in /usr/bin. The -O5 flag results in high optimization, and is just designed in this case to make the compilation last a bit longer. After the script has been created, it is made executable using the chmod command.

After all of that, we are finally ready to submit the job.


  #-------------
  # Submit job to xgrid
  #-------------
  cd "$inputdir"
  xgrid -h localhost -job run -in "$inputdir" -out "$outputdir" run &

done

We first change to the input directory, where the 'run' script is located, and then use the xgrid command to run the job. The -h option indicates the Controller machine we are submitting to. In this case, it is simply localhost. The next option is -job run, which indicates we are running a job synchronously, that is, the xgrid command will not return until the job has completed. The -in option gives the path of the input directory, and -out, the output directory. Lastly, the name of the executable script is supplied, and the whole command put into a subprocess using the &. If we didn't do this, the script would wait until one job was complete before submitting another. Our intention is to submit all jobs simultaneously.

The end is in sight.


#-------------
# Wait until all subprocesses are finished.
#-------------
wait

#-------------
# Move output files back to submit directory.
#-------------
mv $outputdir/*/*.o "$submitdir"

#-------------
# Remove temporary directories
#-------------
for filename in $filenames
do
  inputdir=/tmp/xgridcc_inpdir_$$_$filename
  rm -rf $inputdir
done
rm -rf $outputdir

The wait command causes the script to block until all subprocesses have completed. After all jobs have finished, we move all object files from the output directory, back to the submit directory. Lastly, another loop over file names is used to delete the input directories, and the output directory is also deleted.

The only thing left to do is try it. In a directory full of source files, and a copy of the script xgridcc, issue the following command:


time xgridcc *.m

This will submit the jobs, and provide timings to boot.

My tests with xgridcc show that grid computing is more difficult than falling off a log, at least if you want to garner some advantage from it. If I compile Narrative on my 600MHz iBook, without using Xgrid, it takes 1 minute, 43 seconds. Using Xgrid with a single Agent it takes twice as long: 2 minutes, 57 seconds. So, the overhead of Xgrid, which includes communication, copying files on the Client, and transferring files between Client, Controller, and Agent, is significant. When I added a 400MHz iMac to my iBook, it took 2 minutes, 2 seconds. Even with two computers, xgridcc was still slower than gcc.

The results are not that impressive, but don't lose heart. We have not made much effort to optimize xgridcc. We could improve it in any number of ways. For one, we could submit multiple files in each job. This would improve the ratio of communication to computation, and reduce the copying and transfer of data. Another improvement would simply be to use something faster, or more plentiful, than a single 400MHz iMac. At my place of work, I have a 466MHz G4, but a colleague has just received a dual 1.8GHz G5. I can use Xgrid to submit jobs to the G5, without having to log on. For me, the performance gain of Xgrid far outweighs the overhead.

Next Time...

That's it for now. In part two we are going to start integrating Xgrid into a standalone Cocoa application. We will try something a little more exciting than compilation: batch image processing. You'll be able to apply effects to your whole iPhoto library in one hit, rather than laboriously going through them one at a time. Until then, think distributed.

Drew McCormack works at the Free University in Amsterdam, and develops the Cocoa shareware Trade Strategist.


Return to the Mac DevCenter