Integrating Xgrid into Cocoa Applications, Part 1
Pages: 1, 2
The Xgrid Application Client
The last part of the puzzle is the Xgrid Clients. There are now two:
a Cocoa app unusually christened Xgrid, and the command-line
Client, going by the imaginative title of
xgrid. We will
spend most of our time working with the command-line tool, but let's
take a quick look at the GUI to test our installation, and see how things
work in practice.
Go to the
/Applications directory, and open the Xgrid app.
You should be presented with a choice of available Controllers. Probably
only your local Controller will be visible. You can either choose to
connect to the local Controller, or you can click the button Start Local
Service, which can be used for testing purposes.
Having connected to a Controller, you should see the New Job window. Here you can see the available clusters. Double click the cluster called "Rendezvous." You should see which Agents are connected to the Controller. You will probably see only your own computer here, and hopefully it has the status 'Available'.
Returning to the New Job window, you can see a number of job types represented
in the column on the right. Each of these is an Xgrid plugin, which
can be made to submit different types of jobs. You can write and install
these plugins yourself by following the documentation described in
We are going to quickly run the Mandelbrot plugin. Double click on it. If the calculation doesn't start, click the Start button. This plugin calculates the Mandelbrot set, which you should see begin to appear. You should also see a Tachometer window, which indicates how many MHz are being applied to the job across the whole Xgrid. If you are just using one computer, the reading should be about the same as your CPU speed. If you install Xgrid on another computer, and connect both Agents to the Controller, the tachometer should give a value equal to the sum of the available CPU speeds.
The Command-Line Client
We are now going to take a detailed look at the command-line Client, because it will teach us more about how Xgrid works, and also because it is currently the best way to make use of Xgrid from a standalone Cocoa application.
The command-line Client is accessed through the
The best place to read about it is in the man page. Simply open a terminal
window, and type
In submitting any job to Xgrid, the Client has to provide a number of things to the Controller. The first is the executable to be run. More often than not, this will be a shell script, but it could also be a Cocoa tool, for example. Along with the executable, the Client has the option of supplying a standard input file and/or a directory of files. This directory will be copied by the Controller to the Agent running the job, and becomes the working directory on the Agent machine (that is, the directory where the executable will run). You can supply anything in the working directory, from input files to libraries and executables.
After a job completes, the standard output and error streams are returned to the Controller. The Client can choose to have these streams piped to files, or they can simply be directed to the output/error streams of the shell that submitted the job. Any other files created in the working directory on the Agent can also be retrieved, if requested by the Client. These are returned to an output directory indicated when the job is submitted.
Example: Distributed Builds with Xgrid
Let's see a concrete example. We are going to write a simple script that will perform distributed builds, like Xcode. A nice thing about using Xgrid for this is that it is not necessary to install our application — in this case the gcc compiler — on each computer in the cluster. Instead, if we choose, we can send our executables along with each job. This is an important distinction between grid applications, and other distributed applications like Xcode: the latter need to be preinstalled on every system.
You can download the script described here, along with some Objective-C code to practice on, by clicking here. You can also practice with your own source code if you like, of course. The source code provided here is taken from an open source Cocoa plotting framework I developed called 'Narrative'. I use this framework in the Cocoa application Trade Strategist, which is Technical Analysis software for the stock market. You can download the full source of Narrative here.
The first lines of the script look like this:
#!/bin/sh #------------- # Set filenames variable to be everything on the # command line. #------------- filenames=$*
The shebang indicates that this script will be run in a Bourne shell
(which is actually
bash on Mac OS X). The variable
is set on the next executable line. In a Bourne shell, the
variable represents anything that appears on the command line after
the script name. The script thus expects that the names of the files
to be compiled will be given on the command line.
The next lines set the locations of a number of temporary directories, which will be used to prepare our jobs and retrieve output.
#------------- # Set path to output directory, and the submit directory. # Submit directory is the current directory. # Output directory is in /tmp. Create it here. #------------- outputdir=/tmp/xgridcc_outdir_$$ mkdir -p $outputdir submitdir=`pwd`
outputdir holds the path to the directory where
xgrid to return the output files. We put this in
/tmp so as to minimize the risk of damaging useful data.
Another Bourne shell variable is used in forming the directory name:
$$ is the value of the PID number of the running process.
Using this helps avoid naming conflicts with other processes. After
the output directory path has been set, the directory is created with
mkdir. The next line sets the
to the path of the current directory. The
pwd command writes
the path of the current directory, and the back quotes indicate that
the output of the command should be inserted in place of the command
Now we begin a loop over the files to be compiled:
#------------- # Loop over files. Create one job per file. #------------- for filename in $filenames do #------------- # Change to submission directory #------------- cd "$submitdir" #------------- # Setup variables for input directory, and # job file. #------------- inputdir=/tmp/xgridcc_inpdir_$$_$filename xgridscript=$inputdir/run mkdir -p $inputdir
At the beginning of the loop, we make sure that we're in the directory
where the job was originally submitted. Then we set some more variables;
for the input directory, which becomes the working directory on the
Agent, and for the path to the script that will be passed to the
command. Each iteration of the loop will submit a single job to the
Xgrid, with a script to compile a single file. Each submission needs
its own input directory, so that is why it's created here, rather than
outside the loop.
Now that we have an input directory, we need to populate it with the files needed to perform a compilation.
#------------- # Copy gcc to input directory. #------------- cp /usr/bin/gcc $inputdir/gcc #------------- # Copy source file to the input directory # Copy all headers to input directory #------------- cp $filename $inputdir cp *.h $inputdir
We could install
gcc on every computer, and make use of
the installed copy in our job, but to demonstrate more a typical use,
we supply our own version of
gcc in the input directory.
The file to be compiled (
$filename), and all header files,
are also copied to the input directory. In a more sophisticated build
system, we would copy only the headers needed to compile the file, but
we will sacrifice efficiency for simplicity here.
You will recall that above we set a variable for the path to a script file that will be the executable run by Xgrid on the Agent machine. We have a path, but still need to create the file itself. Here is how:
#------------- # Create the script that will be submitted for this file # to xgrid. # Make it executable. #------------- cat <<eor > $xgridscript #!/bin/sh ./gcc -c -O5 $filename eor chmod +x $xgridscript
cat to concatenate two lines to the output stream,
which is piped to the path we created earlier (
The special operator
<< indicates that the input
stream should be read from the lines that follow. Using
the input stream is read from the script file up until the next line
eor. (Note that there is nothing special
about the string 'eor'; you can use any string you like.)
The script that we create is simplicity incarnate. It just compiles
our file with the
gcc compiler in the working directory.
Remember that this script will be run on the Agent machine, in the working
directory. The working directory contains the version of
that we copied earlier, and we want to use this copy, not the copy in
-O5 flag results in high optimization,
and is just designed in this case to make the compilation last a bit
longer. After the script has been created, it is made executable using
After all of that, we are finally ready to submit the job.
#------------- # Submit job to xgrid #------------- cd "$inputdir" xgrid -h localhost -job run -in "$inputdir" -out "$outputdir" run & done
We first change to the input directory, where the 'run' script is located,
and then use the
xgrid command to run the job. The
option indicates the Controller machine we are submitting to. In this
case, it is simply
localhost. The next option is
run, which indicates we are running a job synchronously, that is,
xgrid command will not return until the job has completed.
-in option gives the path of the input directory, and
-out, the output directory. Lastly, the name of the executable
script is supplied, and the whole command put into a subprocess using
&. If we didn't do this, the script would wait until
one job was complete before submitting another. Our intention is to
submit all jobs simultaneously.
The end is in sight.
#------------- # Wait until all subprocesses are finished. #------------- wait #------------- # Move output files back to submit directory. #------------- mv $outputdir/*/*.o "$submitdir" #------------- # Remove temporary directories #------------- for filename in $filenames do inputdir=/tmp/xgridcc_inpdir_$$_$filename rm -rf $inputdir done rm -rf $outputdir
wait command causes the script to block until all subprocesses
have completed. After all jobs have finished, we move all object files
from the output directory, back to the submit directory. Lastly, another
loop over file names is used to delete the input directories, and the
output directory is also deleted.
The only thing left to do is try it. In a directory full of source files,
and a copy of the script
xgridcc, issue the following command:
time xgridcc *.m
This will submit the jobs, and provide timings to boot.
My tests with
xgridcc show that grid computing is more
difficult than falling off a log, at least if you want to garner some
advantage from it. If I compile Narrative on my 600MHz iBook, without
using Xgrid, it takes 1 minute, 43 seconds. Using Xgrid with a single Agent
it takes twice as long: 2 minutes, 57 seconds. So, the overhead of Xgrid, which
includes communication, copying files on the Client, and transferring
files between Client, Controller, and Agent, is significant. When I
added a 400MHz iMac to my iBook, it took 2 minutes, 2 seconds. Even with two
xgridcc was still slower than
The results are not that impressive, but don't lose heart. We have not
made much effort to optimize
xgridcc. We could improve
it in any number of ways. For one, we could submit multiple files in
each job. This would improve the ratio of communication to computation,
and reduce the copying and transfer of data. Another improvement would
simply be to use something faster, or more plentiful, than a single
400MHz iMac. At my place of work, I have a 466MHz G4, but a colleague
has just received a dual 1.8GHz G5. I can use Xgrid to submit jobs to
the G5, without having to log on. For me, the performance gain of Xgrid
far outweighs the overhead.
That's it for now. In part two we are going to start integrating Xgrid into a standalone Cocoa application. We will try something a little more exciting than compilation: batch image processing. You'll be able to apply effects to your whole iPhoto library in one hit, rather than laboriously going through them one at a time. Until then, think distributed.
Return to the Mac DevCenter
- Trackback from http://www.philblog.com/mt-archive/000206.html
2004-06-17 09:56:05 [View]
- Trackback from http://www.nomorestars.com/B60819807/C1409791567/E180688253/index.html
Xgrid and Cocoa Applications
2004-06-07 09:09:29 [View]
2004-05-14 10:09:24 jamesreynolds [View]
2004-05-14 10:16:00 drewmccormack [View]
2004-05-14 10:08:21 jamesreynolds [View]
Xgrid jobs write to /tmp only? That is very wrong.
2004-05-14 10:14:06 drewmccormack [View]
XGrid in Action
2004-05-13 11:10:43 [View]
2004-05-13 08:36:49 Jay Kreibich | [View]
2004-05-13 10:07:16 drewmccormack [View]
Cocoa and Xgrid
2004-05-12 11:28:06 [View]