MacDevCenter    
 Published on MacDevCenter (http://www.macdevcenter.com/)
 See this if you're having trouble printing code examples


Integrating Xgrid into Cocoa Applications, Part 2

by Drew McCormack
05/18/2004

In Part 1 of this two-part extravaganza we covered a lot of ground, with barely a mention of Cocoa. We were witness to a future vision for Xgrid, installed its present incarnation, played with it a bit, and then got down and dirty on the command line. The title would suggest that there may actually be some Cocoa in these pages, and I am here to assure you that it wasn't all just a PR stunt to capture your attention. I will now deliver on the promise.

Photo Industry

In this part of our journey into Xgrid, we're going to develop a little Cocoa application called Photo Industry. This will be an Xgrid-enabled app, and what's more, it will be a standalone application, not an Xgrid client plugin. To achieve this goal, we'll leverage the xgrid command-line tool using a Cocoa class called NSTask. I hope that in the near future, perhaps as early as WWDC, Apple will make this technique obsolete by publishing an Xgrid client Cocoa API, but until that time we can at least appreciate the potential of Xgrid by wrapping the xgrid tool.

Photo Industry, an Xgrid-enabled JPEG image filtering application.

Photo Industry: an Xgrid-enabled JPEG image filtering application. Baby courtesy of proud parents.

Photo Industry is a JPEG image filtering program. The best way to see what it does is to download the finished product, and try it out. Here is how:

  1. Make sure you are running Mac OS X 10.3.x. Photo Industry will not work on earlier versions of Mac OS X.
  2. Download the Photo Industry app here, and put it somewhere on your hard disk. (By the way, if you are wondering who created the swish icon, you need look no further than Bobby is a young lad with a lot of talent, and will create polished icons for you at a fraction of the price of the big boys.)
  3. Start it up by double clicking.
  4. Start an Xgrid controller on your computer in the System Preferences panel, and start agents on one or more computers attached to your LAN or WLAN.
  5. In Photo Industry, select one or more filters from the table shown.
  6. In Finder, select a number of JPEG images (say <10), and drag them to the top image view in Photo Industry. WARNING: PLEASE DON'T USE YOUR SOLE COPY OF AUNTY JOAN'S WEDDING SNAPS. MAKE A COPY, OR USE IMAGES THAT YOU DON'T MIND LOSING SHOULD ANYTHING GO WRONG.
  7. When presented with an Open sheet, select a directory where you would like to have the output images end up.
  8. Wait. You can monitor progress on the progress bar, with the timer, and from the display of processed images that appear in the lower image view.
  9. If all goes well, your processed images should be in the output directory upon completion.

Play around with the app for a few minutes. See what happens to the processing time when you add an agent to your Xgrid "cluster" or take one away. And when you are ready, we can start looking at how it all works.

Wrapping xgrid in Cocoa

To get started, download the Xcode project and source code here. Unpack it, and open it in Xcode.

The most generic class in this project is called DistributedTask. It is the class that wraps the xgrid tool, providing its functionality to the rest of the Cocoa source. You could probably use this class directly in your own projects without any modification at all.

DistributedTask is really nothing more than an Xgrid class, and should an Apple Xgrid API ever appear, I would expect something very similar to DistributedTask to be in it. The name has been inspired by similarities with Cocoa's NSTask: DistributedTask is basically the same as NSTask, but runs a series of commands on a distributed Xgrid network, rather than just a single command on the local machine.

However, the name could cause some confusion, so let's clarify things before we start. What we call a distributed task in Photo Industry, roughly corresponds to a job in Xgrid. A distributed task is made up of a number of subtasks, which are approximately the same as tasks in Xgrid terminology. Hopefully I've made that confusing enough for you.

The DistributedTask class itself is pretty straightforward, though it does involve multithreading. Here is the public interface:


#import <Foundation/Foundation.h>

@class DistributedTask;

@interface NSObject (DistributedTaskDelegateMethods)
-(void)distributedTaskDidLaunchSubTasks:(DistributedTask *)distributedTask;
-(void)distributedTask:(DistributedTask *)distributedTask 
    didFinishSubTaskWithIdentifier:(id)identifier;
-(void)distributedTaskDidFinishSubTasks:(DistributedTask *)distributedTask;
@end


@interface DistributedTask : NSObject {
    NSString *controllerURLString;
    NSMutableDictionary *subTasks;
    NSMutableSet *subTasksRunning;
    BOOL taskHasLaunched;
    id delegate;
}

-(id)initWithControllerURLString:(NSString *)controllerURLString;

-(void)setDelegate:(id)del;
-(id)delegate;

-(unsigned)numberOfSubTasks;

-(NSString *)controllerURLString;

-(void)addSubTaskWithIdentifier:(id)identifier
    launchPath:(NSString *)launchPath
    workingDirectoryPath:(NSString *)workingDirPath
    outputDirectoryPath:(NSString *)outputDirPath
    standardInputPath:(NSString *)standardInputPath
    standardOutputPath:(NSString *)standardOutputPath;
-(void)launch;


@end

As you can see, the class makes use of a delegate, which is a common technique in Cocoa. The delegate is registered to receive information about the distributed task when certain events occur. In this case, a delegate message is sent when the distributed task is launched, when one of its subtasks completes, and when all subtasks have finished.

The class interface block includes a number of attributes that are important for the implementation, and we will discuss these as they come up.

The DistributedTask initializer takes a single argument, namely the Xgrid controller to which it should connect. We just use localhost in Photo Industry, but you could very easily implement a more complicated controller location scheme involving Rendezvous.

The delegate accessor methods are also present in the interface, along with an accessor for the controller URL. Note that there is only a getter for the URL; you can't change the controller after you have initialized the class object, so if later you want to use a different controller, you need to create a new DistributedTask.

Two more rather important methods follow. The first is the method you use to add subtasks to the distributed task. It takes a number of arguments, such as the path to the command to launch; working directory path; output directory path; and standard input and output paths. You may recognize these arguments as corresponding to the command-line arguments of the xgrid command. That's no coincidence, because DistributedTask is really nothing more than a Cocoa xgrid command.

The last method, launch is pretty self explanatory. "Launch" is used instead of "run," or another apt verb, in an attempt at consistency with the terminology used in NSTask.

The implementation file of DistributedTask begins by declaring some private methods, and a number of strings.


@interface DistributedTask (PrivateMethods)
-(void)runSubTaskAsynchronouslyWithDictionary:(NSDictionary *)subTaskDict;
-(void)subTaskDidFinishWithDictionary:(NSDictionary *)subTaskDict;
@end


// SubTask dictionary keys
static NSString *SubTaskIdKey               = @"SubTaskIdKey";
static NSString *LaunchPathKey              = @"LaunchPathKey";
static NSString *OutputDirectoryKey         = @"OutputDirectoryKey";
static NSString *WorkingDirectoryKey        = @"WorkingDirectoryKey";
static NSString *StandardInputKey           = @"StandardInputKey";
static NSString *StandardOutputKey          = @"StandardOutputKey";

We'll address the private methods below. The strings are all keys for a dictionary. Rather than defining a separate SubTask class, I have opted to simply use dictionaries to store information pertaining to each subtask. Defining the keys like this reduces the chances of making a spelling error in a string, which would result in a run-time bug, and also makes explicit what entries a subtask dictionary contains. It is really very similar to defining a struct in C; after all, a dictionary is really no more than a dynamic struct.

The implementation proper begins with the initializer and deallocator.


@implementation DistributedTask


-(id)initWithControllerURLString:(NSString *)url {
    if ( self = [super init] ) {
        controllerURLString = [url copy];
        subTasks = [[NSMutableDictionary alloc] initWithCapacity:10];
        subTasksRunning = [[NSMutableSet alloc] initWithCapacity:10];
        taskHasLaunched = NO;
        delegate = nil;
    }
    return self;
}


-(void)dealloc {
    // Remove task directory   
    [controllerURLString release];
    [subTasks release];
    [subTasksRunning release];
    [super dealloc];
}

The initializer stores the controller URL, and creates some component objects, such as the subTasks dictionary and the subTasksRunning set. The dictionary is used to store details of the subtasks, and the set is used to keep track of which of the subtasks are running and thus which are complete. The taskHasLaunched boolean flag is used to keep track of whether the task has been launched before. A DistributedTask should only be launched once; if you need to repeat a calculation, simply create a new DistributedTask.

A number of straightforward accessors follow.


-(void)setDelegate:(id)del {
    delegate = del;
}

-(id)delegate {
    return delegate;
}

-(unsigned)numberOfSubTasks {
    return [subTasks count];
}

-(NSString *)controllerURLString {
    return controllerURLString;
}

Note that many attributes of the class do not include setters. You can't change the controller URL after initialization, for example. Making aspects of a class immutable, like this, simplifies writing and using the class.

The method to add subtasks to the task looks like this:


// The working directory, output directory, standard input and output are all
// optional. Use nil if they are not to be used.
#define SubNSNullForNil(var)    ( var == nil ? (id)[NSNull null] : (id)var )
-(void)addSubTaskWithIdentifier:(id)identifier
    launchPath:(NSString *)launchPath
    workingDirectoryPath:(NSString *)workingDirPath
    outputDirectoryPath:(NSString *)outputDirPath
    standardInputPath:(NSString *)standardInputPath
    standardOutputPath:(NSString *)standardOutputPath {
    NSDictionary *dict = [NSDictionary dictionaryWithObjectsAndKeys:
        identifier,                             SubTaskIdKey,
        launchPath,                             LaunchPathKey,
        SubNSNullForNil(workingDirPath),        WorkingDirectoryKey,
        SubNSNullForNil(outputDirPath),         OutputDirectoryKey,
        SubNSNullForNil(standardInputPath),     StandardInputKey,
        SubNSNullForNil(standardOutputPath),    StandardOutputKey,
        nil];
    [subTasks setObject:dict forKey:identifier];
    
    // Also add to the subTasksRunning set, which will be used to keep track of 
    // which subTasks have finished when the task is run.
    [subTasksRunning addObject:identifier];
}

This method has a lot of arguments. Each argument corresponds directly to an xgrid command option. As the comment states, you can pass nil for some of the arguments, and these will then not be passed along to xgrid.

The method itself adds a dictionary describing the subtask to the dictionary of all subtasks. The identifier of the subtask is used as the key in the subtasks dictionary. The keys of the dictionary describing the subtask were introduced above, and correspond directly with the arguments of the method. Note that arguments that are nil are substituted with an instance of NSNull, using the macro SubNSNullForNil. This is necessary because you can't enter nil in a dictionary.

The last line of the method adds the subtask identifier to the subTasksRunning mutable set. As explained above, this set will be used to determine how many subtasks are running, and how many have completed.

Now we are getting to the core of the class.


-(void)launch {    
    // Make sure this is the first launch
    if ( taskHasLaunched ) 
        @throw [NSException exceptionWithName:@"MultipleLaunchException"
            reason:@"Attempt to launch DistributedTask multiple times." 
            userInfo:nil];
    else 
        taskHasLaunched = YES;
    
    // Launch subtasks
    NSEnumerator *subTaskEn = [subTasks objectEnumerator];
    NSMutableDictionary *subTaskDict;
    while ( subTaskDict = [subTaskEn nextObject] ) {
        // Create thread where xgrid task will be run
        [NSThread detachNewThreadSelector:
                @selector(runSubTaskAsynchronouslyWithDictionary:)
            toTarget:self 
            withObject:subTaskDict];
    }
    
    // Notify delegate of launch and initial progress
    if ( delegate && 
        [delegate respondsToSelector:
            @selector(distributedTaskDidLaunchSubTasks:)] ) 
        [delegate distributedTaskDidLaunchSubTasks:self];
}

The launch method runs the task. It first checks to see if the task has previously run; if it has, an exception is thrown. I use the new Objective-C exception handling facilities throughout Photo Industry, so if you haven't given them a look yet, this is a good opportunity.

Next, we iterate over the subtasks that are in the subTasks dictionary. For each one, we split off a new thread. The new thread is detached with instructions to call back to the runSubTaskAsynchronouslyWithDictionary: method of the DistributedTask object, which is set as the target, and to pass the subTaskDict dictionary to the method.

After all threads have been launched, a message is sent to the delegate of the class to indicate this. As usual, this message is only sent if there is a delegate set, and the delegate responds to distributedTaskDidLaunchSubTasks:.

The runSubTaskAsynchronouslyWithDictionary: method that each thread calls is something of a monster.


#define ObjectForKeyIsNSNull(dict, key) \
    ([[dict objectForKey:key] isKindOfClass:[NSNull class]])

-(void)runSubTaskAsynchronouslyWithDictionary:(NSDictionary *)subTaskDict {
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    
    @try {
        // Determine the directory where the launch file is
        NSString *launchPath = 
            [subTaskDict objectForKey:LaunchPathKey];
        NSString *launchFileName = 
            [launchPath lastPathComponent];
        NSString *launchPathDir = 
            [launchPath stringByDeletingLastPathComponent];
         
        // Create and setup NSTask for xgrid command
        // Run task in the directory of the launch file
        NSTask *task = [[[NSTask alloc] init] autorelease];
        [task setCurrentDirectoryPath:launchPathDir];
        
        // Create an array with arguments for the NSTask
        NSMutableArray *args = [NSMutableArray arrayWithObjects:
            @"-h", controllerURLString,
            @"-job", @"run",
            nil];
        
        if ( ! ObjectForKeyIsNSNull(subTaskDict, WorkingDirectoryKey) ) {
            [args addObject:@"-in"];
            [args addObject:[subTaskDict objectForKey:WorkingDirectoryKey]];
        }
        
        if ( ! ObjectForKeyIsNSNull(subTaskDict, OutputDirectoryKey) ) {
            [args addObject:@"-out"];
            [args addObject:[subTaskDict objectForKey:OutputDirectoryKey]];
        }
        
        if ( ! ObjectForKeyIsNSNull(subTaskDict, StandardInputKey) ) {
            [args addObject:@"-si"];
            [args addObject:[subTaskDict objectForKey:StandardInputKey]];
        }
        
        if ( ! ObjectForKeyIsNSNull(subTaskDict, StandardOutputKey) ) {
            [args addObject:@"-so"];
            [args addObject:[subTaskDict objectForKey:StandardOutputKey]];
        }
        
        [args addObject:launchFileName];
        
        [task setArguments:args];
        [task setLaunchPath:@"/usr/bin/xgrid"];

        // Launch task, and wait for it to finish
        [task launch];
        [task waitUntilExit];
        int status = [task terminationStatus];
        if (status != 0) NSLog(@"xgrid task failed with status code %d", status);
    }
    @catch (NSException *exception) {
        NSLog(@"An %@ exception was raised in runSubTaskAsynchronouslyWithDictionary: %@", 
            [exception name], [exception reason] );
    }
    @finally {
        [self performSelectorOnMainThread:@selector(subTaskDidFinishWithDictionary:) 
            withObject:subTaskDict waitUntilDone:NO];
        [pool release];
    }
    
}

It begins by initializing an NSAutoreleasePool, and ends by releasing it. You should do this in any method that is called by a new thread. An autorelease pool is created for you in the main thread, but not in threads that you create yourself. If you don't setup an autorelease pool in each new thread, you will get annoying warning messages in the log, and every call to the autorelease method will result in a memory leak.

The rest of the method is embedded in a @try-@catch-@finally block. This is to catch any exceptions that may arise in the worker thread, and which could cause the program to crash. The exception handling here is not very advanced, but it is better than nothing. Basically, if something goes wrong, a log message is printed, and control returned to the main thread as if everything went to plan. It would be better to inform the user of the problem, but we'll leave that to version 2.0. Note that a @finally block gets executed whether an exception is raised or not, so the autorelease pool is always released, preventing potential memory leaks.

runSubTaskAsynchronouslyWithDictionary: simply extracts the data passed to it via the dictionary, and adds it to an array, with each entry coupled to an appropriate xgrid option. For example, the input directory, which becomes the working directory on the agent, is added to the array by first inserting the option string @"-in", followed by the path retrieved from the dictionary. This option is only included if the working directory value in the dictionary is not an instance of the class NSNull.

The entries in the array are set as the arguments of an NSTask, using the method setArguments:. At the same time, other aspects of the NSTask, such as the path of the executable to be launched by the NSTask, which is /usr/bin/xgrid, and the current directory, which is the directory containing the executable that xgrid will run.

If you are wondering where I got these paths from, you can take a look at the top of the method. There I make use of a few path manipulation methods from the NSString class, like lastPathComponent and stringByDeletingLastPathComponent. These methods are very useful, and used extensively throughout Photo Industry. The Cocoa documentation will tell you more.

Returning to the bottom of the method, the NSTask is launched, and we wait for it to complete, just like last time in the xgridcc script. We also check the return value upon completion, and log any errors.

Whether the NSTask completes without error or not, we need to return control to the main thread, informing the delegate that the subtask, and possibly the whole DistributedTask is finished. To do that we use the convenience method performSelectorOnMainThread:withObject:waitUntilDone:. And it really is convenient, believe me.

The method called on the main thread looks like this:


-(void)subTaskDidFinishWithDictionary:(NSDictionary *)subTaskDict {
    id identifier = [subTaskDict objectForKey:SubTaskIdKey];
    [subTasksRunning removeObject:identifier];
    
    // Notify delegate of subtask completion
    if ( delegate && 
        [delegate respondsToSelector:
            @selector(distributedTask:didFinishSubTaskWithIdentifier:)] ) 
        [delegate distributedTask:self 
            didFinishSubTaskWithIdentifier:identifier];
    
    // If finished all subtasks, notify delegate
    int subTasksRemaining = [subTasksRunning count];
    if ( delegate && 
        [delegate respondsToSelector:@selector(distributedTaskDidFinishSubTasks:)] &&
        subTasksRemaining == 0 ) 
        [delegate performSelector:@selector(distributedTaskDidFinishSubTasks:) 
            withObject:self afterDelay:0.0];
}

This method keeps track of which subtasks have completed, and informs the delegate of developments. The subtask is first removed from the subTasksRunning set. The delegate is then informed that the subtask has completed, and lastly the delegate is informed that the distributed task has completed, if there are no subtasks still running.

That concludes the DistributedTask code. This xgrid wrapper is quite general, and you could very easily reuse it in your own Cocoa software. I fully expect that in the course of time Apple will publish API that includes a class very similar to DistributedTask, but in the meantime, DistributedTask will at least give you a taste of Xgrid in Cocoa, and hopefully teach you a few tricks with NSTask and multithreading.

A Not-so-Bitter PIL

Now I have a confession to make: Last time I promised to create a batch image processing app using the library Imagemagick. Well, after spending a few days trying to build something in Cocoa that would link to the Imagemagick libraries, or a command-line Imagemagick tool, and that did not have any dependencies on dynamic libraries, I began to despair. Then I remembered the Python Imaging Library (PIL).

PIL, as the name would suggest, is written in Python. Python is a great high-level scripting language, powerful, and yet extremely simple. It fills a similar role to Perl, but is easier to master, and is Object-Oriented to the core.

The Mac has attracted an enthusiastic clan of Python developers, and they tend to congregate around the pythonmac.org and MacPython web sites. The Mac Python developers do a fantastic job and have managed to convince Apple to include a Python framework in Panther. You can find it at /System/Library/Frameworks/Python.framework.

Compared to the pain I experienced trying to build a standalone version of Imagemagick, installing PIL was simplicity itself. A quick note to the MacPython mailing list turned up a standalone module built by Bob Ippolito. If you want to install this yourself:

  1. Install the MacPython extensions found at the MacPython web site.
  2. Open the application PackageManager in the MacPython directory in Applications
  3. Choose 'Open URL...' from the File menu, and enter Bob's repository address:
    http://undefined.org/python/pimp/darwin-7.2.0-Power_Macintosh.plist
  4. Selecting PIL in the list of modules, and click install.

This little exercise demonstrates one of the complications of grid computing, namely that in general you need to be able to build a standalone version of your software, with no dependencies on non-standard dylibs or frameworks. You can check what libraries and frameworks an executable or library makes use of using the command otool. Just issue this:


otool -L path/to/binary

on the command line.

I won't turn this into a Python lesson, but I do want to show you some parts of the script that performs the image processing, so you get an idea of how the agent-side code works, and how beautiful Python code is. The file agentrunscript.py begins like this:


#!/usr/bin/env python
import sys
import os
import os.path 
import string

# Add PIL directory to module search path
workingDir = os.getcwd()
pilPath = os.path.join(workingDir, "PIL")
sys.path.append( pilPath )

After importing some modules, the workingDir variable is set to the current working directory, using the function getcwd from the module os. A path is generated from this for the location of PIL, which will be in the subdirectory PIL in the working directory. The function join, from the module os.path, is used to achieve this. The PIL directory path is then added to the search path used by Python to find modules, using the sys.path.append function. Python will now be able to find our copy of PIL when we come to use it.

Next, the script reads standard input. Standard input is used to send a list of the filters that should be applied to the images. The filter names are separated by colons.


# Read filters from standard input
filtersString = sys.stdin.read()
filters = string.split(filtersString, ":")

Standard input is read into the variable filtersString, and then the function split from the module string is used to split it into a list of filter names, which is put in the variable filters. The split function takes an optional argument that is the separator used for splitting the string; this has been set to a colon.

Various modules are then imported from the PIL library, and a list of JPEG files in the working directory created.


# Import PIL
# Filter any jpegs in the working directory
import Image
import ImageFilter
import ImageEnhance
import ImageOps
import glob
jpegFiles = glob.glob("*.[jJ][pP][gG]") + glob.glob("*.[jJ][eE][pP][gG]")
for infile in jpegFiles:
    im = Image.open(infile).copy()
    if "thumbnail" in filters:
        im.thumbnail((128, 128), Image.ANTIALIAS)
    if "blur" in filters:
        im = im.filter(ImageFilter.BLUR)
    if "emboss" in filters:
        im = im.filter(ImageFilter.EMBOSS)

    ...

    im.save(infile, "JPEG")

The function glob works much like globbing works in UNIX shell scripting. The wildcard matches one or more of any character, and the letters in the square brackets match exactly one letter in the filename.

A loop is used to iterate over the JPEG files; each one is opened, using the open function from the PIL module Image, and copied. if branches then check for each filter type in the filters list. If the filter name is found in the list, the filter is applied. There are several different ways of applying filters, and each filter tends to have its own special arguments. You can learn more by reading the PIL manual. The loop — and the script — end by saving the filtered image under the original file name.

Hopefully this demonstrates to you that Python is not only an elegantly simple language, but also a powerful one. We have been able to build lists of files, split strings, search lists, and process images with amazing ease. If you need a scripting language for your Xgrid activities, Python comes highly recommended, not least because it is installed on every Mac sporting Panther or higher.

Controlling the Industry

The last part of the puzzle is the Cocoa controller class, PIController, which prepares the Xgrid job, and takes care of the User Interface (UI). We won't deal with the UI here; the source code is there for all to see. Instead, we will concentrate only on those parts of the controller that deal with preparing jobs for Xgrid.

The awakeFromNib method of PIController reads the filters.plist property list file to initialize the filters available in Photo Industry.


-(void)awakeFromNib {
    ...
    // Initialize available filters from plist file
    NSBundle *bundle = [NSBundle bundleForClass:[self class]];
    NSString *plistPath = [bundle pathForResource:@"filters" ofType:@"plist"];
    NSData *data = [NSData dataWithContentsOfFile:plistPath];
    NSString *errorString;
    NSArray *filtersArray = [NSPropertyListSerialization propertyListFromData:data 
        mutabilityOption:NSPropertyListMutableContainers 
        format:NULL 
        errorDescription:&errorString];
    NSAssert( nil != filtersArray, @"Could not read property list of filters." );
    [self setFilters:filtersArray];
}

The main NSBundle is used to locate the file, and the data is then read into an instance of NSData. This data is turned into an array of filter information by the NSPropertyListSerialization class. There are easier ways to do this, like simply calling the NSArray method arrayWithContentsOfFile:, but we have taken the long route because we want our array to be populated with mutable objects. The option NSPropertyListMutableContainers achieves this objective. The objects need to be mutable, because they will be used to store whether a filter is on or off, and this can be changed by the user.

You may be wondering what sort of objects make up filtersArray. They are simply NSMutableDictionarie's, as you can see by taking a look in the filters.plist file.


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" 
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
    <dict>
        <key>filterId</key>
        <string>thumbnail</string>
        <key>filterName</key>
        <string>Thumbnail</string>
        <key>isOn</key>
        <false/>
    </dict>
    <dict>
        <key>filterId</key>
        <string>blur</string>
        <key>filterName</key>
        <string>Blur</string>
        <key>isOn</key>
        <false/>
    </dict>

As you can see, each dictionary in the array holds key-value pairs for the filter's identity, which is used in Photo Industry and the Python agent script to refer to the filter -- the filter's name, which is what the filter is called in the UI -- and whether it is on or off. All are initially turned off, but the user can turn on filters in the UI, and this requires that the dictionary entry be able to change (i.e., be mutable).

Most of the PIController code that relates to Xgrid can be found in the method applyFilters:toFilesWithPaths:forOutputDirectoryPath:. If you write your own Xgrid-enabled Cocoa app, you will likely need a very similar method, so let's take a good look at how it works.

We will skip anything that is related to the UI, rather than Xgrid. Here is how the method begins.


-(void)applyFilters:(NSArray *)filter toFilesWithPaths:(NSArray *)paths
    forOutputDirectoryPath:(NSString *)eventualOutputDirPath {

    ...
        
    // Create distributed task
    NSFileManager *fm = [NSFileManager defaultManager];
    DistributedTask *task = 
        [[[DistributedTask alloc] initWithControllerURLString:@"localhost"] autorelease];
    
    // Store the distributed task. Register for delegate messages.
    [self setDistributedTask:task];
    [task setDelegate:self];

Here we are introduced to the class NSFileManager. You had better get used to it, because this guy is going to be your partner for much of the rest of this article. NSFileManager takes care of the stuff that commands like mv, cp, rm, and ln do in a shell. Anytime you need to move, copy, remove, or link a file, you know who you have to see.

At this point we also create our distributed task, which is our interface to Xgrid. The PIController is made delegate of the task, and the DistributedTask is initialized to make use of a controller on the local machine, localhost. No attempt is made to check whether there is actually a controller on the local host, and it is not possible to run Photo Industry using a controller on another computer.

This simplistic approach could easily be improved upon: Controllers advertise themselves with Rendezvous, so you can go looking for them, and when you find them, you can query them about things like how many nodes they have available, using the xgrid command-line tool (see the -node option in the xgrid man page). Such an approach would lead to a much more flexible piece of software, but it is too advanced for this introduction. You can read a good introduction to using Rendezvous in Cocoa by Mike Beam here.

PIController then sets up some directories for the task.


    // Set the output directory path.
    [self setOutputDirectoryPath:eventualOutputDirPath];

    // Setup a temporary directory.
    // Also create a directory where the output will end up.
    NSString *uniqueString = [[NSProcessInfo processInfo] globallyUniqueString];
    NSString *dirName = 
        [NSString stringWithFormat:@"photoindustry_%@", uniqueString];
    [self setTaskTempDirectoryPath:
        [NSTemporaryDirectory() stringByAppendingPathComponent:dirName]];
    NSString *taskOutputDirPath = 
        [taskTempDirPath stringByAppendingPathComponent:@"output"];
    
    [fm createDirectoryAtPath:taskTempDirPath attributes:nil];
    [fm createDirectoryAtPath:taskOutputDirPath attributes:nil];

It stores the output directory that the user has requested (i.e. eventualOutputDirPath). This is not the output directory used by DistributedTask, it is the place where all photos must eventually end up. The DistributedTask will put its output in a temporary directory, which is created next. An NSProcessInfo object is used to generate a unique string, which is then used to come up with a name for the tasks temporary directory, reducing the likelihood that any conflict will occur. A temporary directory is created for all files used by the DistributedTask. This is a subdirectory of the directory returned by the Cocoa function NSTemporaryDirectory. In the task's directory, another subdirectory is created exclusively for output from the DistributedTask. The NSFileManager is used to do the directory creation.

Next, a colon-separated list of filters is created, the same one that our Python script received on standard input.


    // Create a standard input file for all subtasks
    // This is just a colon-separated list of the filter ids of the filters
    // that need to be applied
    NSMutableArray *filterStringArray = [NSMutableArray arrayWithCapacity:10];
    NSEnumerator *en = [filters objectEnumerator];
    NSDictionary *filterDict;
    while ( filterDict = [en nextObject] ) {
        BOOL isOn = [[filterDict objectForKey:@"isOn"] boolValue];
        if ( isOn ) 
            [filterStringArray addObject:[filterDict objectForKey:@"filterId"]];
    }
    NSString *stdInString = [filterStringArray componentsJoinedByString:@":"];
    NSString *siPath = 
        [taskTempDirPath stringByAppendingPathComponent:@"standardinput"];
    [stdInString writeToFile:siPath atomically:NO];

We simply iterate over all the filter dictionaries in the filters array, checking if they are on or off. If on, they are added to our list. Lastly, this list is written to a file in the task's directory. Later this file will be set as the standard input of the subtasks in our DistributedTask.

Now the subtasks must be setup.


    // Create an input directory for each subtask in the temporary directory.
    // Copy photos into the input directories of subtasks.
    // Distribute photos as evenly as possible amongst subtasks. If the 
    // number of photos doesn't exactly divide by the number of subtasks, some
    // subtasks are required to process one extra photo. 
    unsigned baseNumPhotosPerSubTask = [paths count] / NumDistributedSubTasks;
    unsigned numSubTasksWithOneExtra = [paths count] % NumDistributedSubTasks;
    unsigned subTaskIndex, photoIndex = 0;
    for ( subTaskIndex = 0; subTaskIndex < NumDistributedSubTasks; 
          subTaskIndex++ ) {
        NSString *subTaskIndexString = 
            [NSString stringWithFormat:@"%d", subTaskIndex];
        NSString *inputDirPath = 
            [taskTempDirPath stringByAppendingPathComponent:subTaskIndexString];
        [fm createDirectoryAtPath:inputDirPath attributes:nil];
        
        // Copy photos to the input directory for the subtask
        unsigned numPhotosThisSubTask = baseNumPhotosPerSubTask;
        if ( subTaskIndex < numSubTasksWithOneExtra ) ++numPhotosThisSubTask;
        if ( numPhotosThisSubTask == 0 ) continue; 
            // Don't start subtask for no photos
        unsigned subTaskPhotoIndex;
        for ( subTaskPhotoIndex = 0; subTaskPhotoIndex < numPhotosThisSubTask; 
              subTaskPhotoIndex++ ) {
            NSString *photoPath = [paths objectAtIndex:photoIndex];
            [fm copyPath:photoPath toDirectoryAtPath:inputDirPath];
            photoIndex++;
        }

Some arithmetic is performed to determine how many photos each subtask should process. The algorithm simply tries to spread the number of photos as evenly as possible over the subtasks. If the number of photos does not divide exactly by the number of subtasks, some tasks are required to take one extra photo. The number of subtasks is simply a constant, NumDistributedSubTasks, which is elsewhere set to 4.

A loop over subtasks begins, and an input directory is created for each subtask. The subtask's photos, the paths to which are passed to the method, are then copied to the input directory. Creating a link would be faster, but I found some unnerving behavior whenever a linked file is deleted: the Finder seems to think that all links to a file are deleted when any one of them is. This seems to be an error in Finder, not in the filesystem itself, because the linked file does continue to exist. Nonetheless, I thought it was safer to copy the original files so that should anything go wrong, they would not be lost.

You will not find the method copyPath:toDirectoryAtPath:, which belongs to NSFileManager, in the Cocoa documentation. That's because it belongs to a category that I have created in Photo Industry. This is what it looks like.


@interface NSFileManager (PIControllerExtensions)
-(void)copyPath:(NSString *)path toDirectoryAtPath:(NSString *)inputDir;
@end


@implementation NSFileManager (PIControllerExtensions)

-(void)copyPath:(NSString *)path toDirectoryAtPath:(NSString *)dirPath {
    NSString *filePathInDir = 
        [dirPath stringByAppendingPathComponent:[path lastPathComponent]];    
    [[NSFileManager defaultManager] copyPath:path toPath:filePathInDir handler:nil];
}

@end

This method is a convenience, because we regularly need to copy files to directories, and it is a bit annoying to have to keep using the stringByAppendingPathComponent: method to first setup the new file path, when the file name does not need to change.

We can now finish off the applyFilters:toFilesWithPaths:forOutputDirectoryPath: method.


        // Copy PIL for the input directory
        NSBundle *bundle = [NSBundle bundleForClass:[self class]];
        NSString *pilPath = [bundle pathForResource:@"PIL" ofType:nil];
        NSAssert( nil != pilPath, @"PIL path was nil." );
        [fm copyPath:pilPath toDirectoryAtPath:inputDirPath];
        
        // Add subtask to task
        NSString *scriptPath = 
            [bundle pathForResource:@"agentrunscript" ofType:@"py"];
        [task addSubTaskWithIdentifier:[NSNumber numberWithInt:subTaskIndex]
            launchPath:scriptPath
            workingDirectoryPath:inputDirPath
            outputDirectoryPath:taskOutputDirPath
            standardInputPath:siPath
            standardOutputPath:nil];
    }
    
    [task launch];
    
}

Still inside the loop over subtasks, we copy the directory containing PIL to the input directory of the subtask. The subtask is then added to the distributed task, using the method discussed earlier, setting the launch path, input and output directories of the subtask, and the standard input file. We don't need the standard output, so nil is passed for that. Finally, when all subtasks have been added to the distributed task, the task is launched.

The progress of the DistributedTask is monitored using its delegate methods. The PIController was set as the delegate to the task, so it can implement the methods, and act upon them as required. We will take a look at the method called when the DistributedTask has finished running all its subtasks on Xgrid: distributedTaskDidFinishSubTasks:.

The method first checks that the output directory that the user requested exists. If not, it creates it. If it does exist, and is not a directory, an NSAssert ensures an exception is raised. In a more robust app, you would want to handle this better, by informing the user of the problem.


-(void)distributedTaskDidFinishSubTasks:(DistributedTask *)distributedTask {    
    NSFileManager *fm = [NSFileManager defaultManager];
    
    // Ensure output directory exists, and that it is a directory.
    BOOL isDir;
    if ( [fm fileExistsAtPath:[self outputDirectoryPath] isDirectory:&isDir] ) {
        NSAssert( isDir, @"Output directory path supplied was not a directory." );
    }
    else {
        [fm createDirectoryAtPath:[self outputDirectoryPath] attributes:nil];
    }

Next, the filtered photos, which should be in the output directory of the DistributedTask, are moved to the user's chosen output destination.


    // Move task output files to the output directory
    NSString *taskOutputDirPath = 
        [taskTempDirPath stringByAppendingPathComponent:@"output"];
    [fm changeCurrentDirectoryPath:taskOutputDirPath];
    NSDirectoryEnumerator *en = [fm enumeratorAtPath:taskOutputDirPath];
    NSString *relativePath; // Path relative to taskOutputDirPath
    while ( relativePath = [en nextObject] ) {
        if ( NSOrderedSame != 
                 [[relativePath pathExtension] caseInsensitiveCompare:@"jpg"] &&
             NSOrderedSame != 
                 [[relativePath pathExtension] caseInsensitiveCompare:@"jpeg"] ) 
            continue;
        NSString *fileOutputPath = 
            [[self outputDirectoryPath] stringByAppendingPathComponent:
                 [relativePath lastPathComponent]];
        if ( [fm fileExistsAtPath:fileOutputPath] ) 
            [fm removeFileAtPath:fileOutputPath handler:nil];
        [fm linkPath:relativePath toPath:fileOutputPath handler:nil];
    }

An NSDirectoryEnumerator is employed for this operation. An NSDirectoryEnumerator traverses the contents of a directory, including subdirectories. For each file found, we check if it is a JPEG, and, if so, link it to the output directory. Yes, in this case we do use link, instead of copy, because the operation doesn't pose any threat to the original photos, and is faster.

Finally, we clean up, by removing the entire task temporary directory, which includes all of the files and directories used by the subtasks.


    // Remove temporary directory
    [fm removeFileAtPath:taskTempDirPath handler:nil];
    ...
}

Odds and Ends

We have now covered the parts of Photo Industry that relate directly to Xgrid. If you download the source, you will see there is a lot of other stuff in the application, which could also be useful in your own apps. Photo Industry makes use of drag and drop, for example, and the new Cocoa bindings layer. A good intro to the drag-and-drop techniques can be found on CocoaDevCentral. Bindings are also covered by CocoaDevCentral here, and don't forget that old stalwart Mike Beam, who has recently written on the topic here.

I hope this two-part article has demonstrated the potential of Xgrid in non-scientific Cocoa applications. We have had to do a lot of work in order to leverage that potential, using the command-line xgrid tool, but in all likelihood WWDC will alleviate that in the near future. Hopefully, this article will be irrelevant after June 28, 2004, when SJ finally unveils Apple's vision for Xgrid and the future of distributed computation. To be continued ...

Drew McCormack works at the Free University in Amsterdam, and develops the Cocoa shareware Trade Strategist.


Return to the Mac DevCenter

Copyright © 2009 O'Reilly Media, Inc.