In Part 1 of this two-part extravaganza we covered a lot of ground, with barely a mention of Cocoa. We were witness to a future vision for Xgrid, installed its present incarnation, played with it a bit, and then got down and dirty on the command line. The title would suggest that there may actually be some Cocoa in these pages, and I am here to assure you that it wasn't all just a PR stunt to capture your attention. I will now deliver on the promise.
In this part of our journey into Xgrid, we're going to develop a little Cocoa
application called Photo Industry. This will be an Xgrid-enabled app,
and what's more, it will be a standalone application, not an Xgrid client plugin.
To achieve this goal, we'll leverage the xgrid command-line tool
using a Cocoa class called NSTask. I hope that in the near future,
perhaps as early as WWDC, Apple will make this technique obsolete by publishing
an Xgrid client Cocoa API, but until that time we can at least appreciate the
potential of Xgrid by wrapping the xgrid tool.
![]() Photo Industry: an Xgrid-enabled JPEG image filtering application. Baby courtesy of proud parents. |
Photo Industry is a JPEG image filtering program. The best way to see what it does is to download the finished product, and try it out. Here is how:
Play around with the app for a few minutes. See what happens to the processing time when you add an agent to your Xgrid "cluster" or take one away. And when you are ready, we can start looking at how it all works.
xgrid in CocoaTo get started, download the Xcode project and source code here. Unpack it, and open it in Xcode.
The most generic class in this project is called DistributedTask.
It is the class that wraps the xgrid tool, providing its functionality
to the rest of the Cocoa source. You could probably use this class directly
in your own projects without any modification at all.
DistributedTask is really nothing more than an Xgrid class, and
should an Apple Xgrid API ever appear, I would expect something very similar
to DistributedTask to be in it. The name has been inspired by similarities
with Cocoa's NSTask: DistributedTask is basically
the same as NSTask, but runs a series of commands on a distributed
Xgrid network, rather than just a single command on the local machine.
However, the name could cause some confusion, so let's clarify things before we start. What we call a distributed task in Photo Industry, roughly corresponds to a job in Xgrid. A distributed task is made up of a number of subtasks, which are approximately the same as tasks in Xgrid terminology. Hopefully I've made that confusing enough for you.
The DistributedTask class itself is pretty straightforward, though
it does involve multithreading. Here is the public interface:
#import <Foundation/Foundation.h>
@class DistributedTask;
@interface NSObject (DistributedTaskDelegateMethods)
-(void)distributedTaskDidLaunchSubTasks:(DistributedTask *)distributedTask;
-(void)distributedTask:(DistributedTask *)distributedTask
didFinishSubTaskWithIdentifier:(id)identifier;
-(void)distributedTaskDidFinishSubTasks:(DistributedTask *)distributedTask;
@end
@interface DistributedTask : NSObject {
NSString *controllerURLString;
NSMutableDictionary *subTasks;
NSMutableSet *subTasksRunning;
BOOL taskHasLaunched;
id delegate;
}
-(id)initWithControllerURLString:(NSString *)controllerURLString;
-(void)setDelegate:(id)del;
-(id)delegate;
-(unsigned)numberOfSubTasks;
-(NSString *)controllerURLString;
-(void)addSubTaskWithIdentifier:(id)identifier
launchPath:(NSString *)launchPath
workingDirectoryPath:(NSString *)workingDirPath
outputDirectoryPath:(NSString *)outputDirPath
standardInputPath:(NSString *)standardInputPath
standardOutputPath:(NSString *)standardOutputPath;
-(void)launch;
@end
As you can see, the class makes use of a delegate, which is a common technique in Cocoa. The delegate is registered to receive information about the distributed task when certain events occur. In this case, a delegate message is sent when the distributed task is launched, when one of its subtasks completes, and when all subtasks have finished.
The class interface block includes a number of attributes that are important for the implementation, and we will discuss these as they come up.
The DistributedTask initializer takes a single argument, namely
the Xgrid controller to which it should connect. We just use localhost
in Photo Industry, but you could very easily implement a more complicated controller
location scheme involving Rendezvous.
The delegate accessor methods are also present in the interface, along with
an accessor for the controller URL. Note that there is only a getter for the
URL; you can't change the controller after you have initialized the class object,
so if later you want to use a different controller, you need to create a new
DistributedTask.
Two more rather important methods follow. The first is the method you use to
add subtasks to the distributed task. It takes a number of arguments, such as
the path to the command to launch; working directory path; output directory
path; and standard input and output paths. You may recognize these arguments
as corresponding to the command-line arguments of the xgrid command.
That's no coincidence, because DistributedTask is really nothing
more than a Cocoa xgrid command.
The last method, launch is pretty self explanatory. "Launch" is
used instead of "run," or another apt verb, in an attempt at consistency with
the terminology used in NSTask.
The implementation file of DistributedTask begins by declaring
some private methods, and a number of strings.
@interface DistributedTask (PrivateMethods)
-(void)runSubTaskAsynchronouslyWithDictionary:(NSDictionary *)subTaskDict;
-(void)subTaskDidFinishWithDictionary:(NSDictionary *)subTaskDict;
@end
// SubTask dictionary keys
static NSString *SubTaskIdKey = @"SubTaskIdKey";
static NSString *LaunchPathKey = @"LaunchPathKey";
static NSString *OutputDirectoryKey = @"OutputDirectoryKey";
static NSString *WorkingDirectoryKey = @"WorkingDirectoryKey";
static NSString *StandardInputKey = @"StandardInputKey";
static NSString *StandardOutputKey = @"StandardOutputKey";
We'll address the private methods below. The strings are all keys for a dictionary.
Rather than defining a separate SubTask class, I have opted to
simply use dictionaries to store information pertaining to each subtask. Defining
the keys like this reduces the chances of making a spelling error in a string,
which would result in a run-time bug, and also makes explicit what entries a
subtask dictionary contains. It is really very similar to defining a struct
in C; after all, a dictionary is really no more than a dynamic struct.
The implementation proper begins with the initializer and deallocator.
@implementation DistributedTask
-(id)initWithControllerURLString:(NSString *)url {
if ( self = [super init] ) {
controllerURLString = [url copy];
subTasks = [[NSMutableDictionary alloc] initWithCapacity:10];
subTasksRunning = [[NSMutableSet alloc] initWithCapacity:10];
taskHasLaunched = NO;
delegate = nil;
}
return self;
}
-(void)dealloc {
// Remove task directory
[controllerURLString release];
[subTasks release];
[subTasksRunning release];
[super dealloc];
}
|
The initializer stores the controller URL, and creates some component objects,
such as the subTasks dictionary and the subTasksRunning
set. The dictionary is used to store details of the subtasks, and the set is
used to keep track of which of the subtasks are running and thus which are complete.
The taskHasLaunched boolean flag is used to keep track of whether
the task has been launched before. A DistributedTask should only
be launched once; if you need to repeat a calculation, simply create a new DistributedTask.
A number of straightforward accessors follow.
-(void)setDelegate:(id)del {
delegate = del;
}
-(id)delegate {
return delegate;
}
-(unsigned)numberOfSubTasks {
return [subTasks count];
}
-(NSString *)controllerURLString {
return controllerURLString;
}
Note that many attributes of the class do not include setters. You can't change the controller URL after initialization, for example. Making aspects of a class immutable, like this, simplifies writing and using the class.
The method to add subtasks to the task looks like this:
// The working directory, output directory, standard input and output are all
// optional. Use nil if they are not to be used.
#define SubNSNullForNil(var) ( var == nil ? (id)[NSNull null] : (id)var )
-(void)addSubTaskWithIdentifier:(id)identifier
launchPath:(NSString *)launchPath
workingDirectoryPath:(NSString *)workingDirPath
outputDirectoryPath:(NSString *)outputDirPath
standardInputPath:(NSString *)standardInputPath
standardOutputPath:(NSString *)standardOutputPath {
NSDictionary *dict = [NSDictionary dictionaryWithObjectsAndKeys:
identifier, SubTaskIdKey,
launchPath, LaunchPathKey,
SubNSNullForNil(workingDirPath), WorkingDirectoryKey,
SubNSNullForNil(outputDirPath), OutputDirectoryKey,
SubNSNullForNil(standardInputPath), StandardInputKey,
SubNSNullForNil(standardOutputPath), StandardOutputKey,
nil];
[subTasks setObject:dict forKey:identifier];
// Also add to the subTasksRunning set, which will be used to keep track of
// which subTasks have finished when the task is run.
[subTasksRunning addObject:identifier];
}
This method has a lot of arguments. Each argument corresponds directly to
an xgrid command option. As the comment states, you can pass nil
for some of the arguments, and these will then not be passed along to xgrid.
The method itself adds a dictionary describing the subtask to the dictionary
of all subtasks. The identifier of the subtask is used as the key in the subtasks
dictionary. The keys of the dictionary describing the subtask were introduced
above, and correspond directly with the arguments of the method. Note that arguments
that are nil are substituted with an instance of NSNull,
using the macro SubNSNullForNil. This is necessary because you
can't enter nil in a dictionary.
The last line of the method adds the subtask identifier to the subTasksRunning
mutable set. As explained above, this set will be used to determine how many
subtasks are running, and how many have completed.
Now we are getting to the core of the class.
-(void)launch {
// Make sure this is the first launch
if ( taskHasLaunched )
@throw [NSException exceptionWithName:@"MultipleLaunchException"
reason:@"Attempt to launch DistributedTask multiple times."
userInfo:nil];
else
taskHasLaunched = YES;
// Launch subtasks
NSEnumerator *subTaskEn = [subTasks objectEnumerator];
NSMutableDictionary *subTaskDict;
while ( subTaskDict = [subTaskEn nextObject] ) {
// Create thread where xgrid task will be run
[NSThread detachNewThreadSelector:
@selector(runSubTaskAsynchronouslyWithDictionary:)
toTarget:self
withObject:subTaskDict];
}
// Notify delegate of launch and initial progress
if ( delegate &&
[delegate respondsToSelector:
@selector(distributedTaskDidLaunchSubTasks:)] )
[delegate distributedTaskDidLaunchSubTasks:self];
}
The launch method runs the task. It first checks to see if the
task has previously run; if it has, an exception is thrown. I use the new Objective-C
exception handling facilities throughout Photo Industry, so if you haven't given
them a look yet, this is a good opportunity.
Next, we iterate over the subtasks that are in the subTasks dictionary.
For each one, we split off a new thread. The new thread is detached with instructions
to call back to the runSubTaskAsynchronouslyWithDictionary: method
of the DistributedTask object, which is set as the target, and
to pass the subTaskDict dictionary to the method.
After all threads have been launched, a message is sent to the delegate of
the class to indicate this. As usual, this message is only sent if there is
a delegate set, and the delegate responds to distributedTaskDidLaunchSubTasks:.
The runSubTaskAsynchronouslyWithDictionary: method that each
thread calls is something of a monster.
#define ObjectForKeyIsNSNull(dict, key) \
([[dict objectForKey:key] isKindOfClass:[NSNull class]])
-(void)runSubTaskAsynchronouslyWithDictionary:(NSDictionary *)subTaskDict {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
@try {
// Determine the directory where the launch file is
NSString *launchPath =
[subTaskDict objectForKey:LaunchPathKey];
NSString *launchFileName =
[launchPath lastPathComponent];
NSString *launchPathDir =
[launchPath stringByDeletingLastPathComponent];
// Create and setup NSTask for xgrid command
// Run task in the directory of the launch file
NSTask *task = [[[NSTask alloc] init] autorelease];
[task setCurrentDirectoryPath:launchPathDir];
// Create an array with arguments for the NSTask
NSMutableArray *args = [NSMutableArray arrayWithObjects:
@"-h", controllerURLString,
@"-job", @"run",
nil];
if ( ! ObjectForKeyIsNSNull(subTaskDict, WorkingDirectoryKey) ) {
[args addObject:@"-in"];
[args addObject:[subTaskDict objectForKey:WorkingDirectoryKey]];
}
if ( ! ObjectForKeyIsNSNull(subTaskDict, OutputDirectoryKey) ) {
[args addObject:@"-out"];
[args addObject:[subTaskDict objectForKey:OutputDirectoryKey]];
}
if ( ! ObjectForKeyIsNSNull(subTaskDict, StandardInputKey) ) {
[args addObject:@"-si"];
[args addObject:[subTaskDict objectForKey:StandardInputKey]];
}
if ( ! ObjectForKeyIsNSNull(subTaskDict, StandardOutputKey) ) {
[args addObject:@"-so"];
[args addObject:[subTaskDict objectForKey:StandardOutputKey]];
}
[args addObject:launchFileName];
[task setArguments:args];
[task setLaunchPath:@"/usr/bin/xgrid"];
// Launch task, and wait for it to finish
[task launch];
[task waitUntilExit];
int status = [task terminationStatus];
if (status != 0) NSLog(@"xgrid task failed with status code %d", status);
}
@catch (NSException *exception) {
NSLog(@"An %@ exception was raised in runSubTaskAsynchronouslyWithDictionary: %@",
[exception name], [exception reason] );
}
@finally {
[self performSelectorOnMainThread:@selector(subTaskDidFinishWithDictionary:)
withObject:subTaskDict waitUntilDone:NO];
[pool release];
}
}
It begins by initializing an NSAutoreleasePool, and ends by releasing
it. You should do this in any method that is called by a new thread. An autorelease
pool is created for you in the main thread, but not in threads that you create
yourself. If you don't setup an autorelease pool in each new thread, you will
get annoying warning messages in the log, and every call to the autorelease
method will result in a memory leak.
The rest of the method is embedded in a @try-@catch-@finally
block. This is to catch any exceptions that may arise in the worker thread,
and which could cause the program to crash. The exception handling here is not
very advanced, but it is better than nothing. Basically, if something goes wrong,
a log message is printed, and control returned to the main thread as if everything
went to plan. It would be better to inform the user of the problem, but we'll
leave that to version 2.0. Note that a @finally block gets executed
whether an exception is raised or not, so the autorelease pool is always released,
preventing potential memory leaks.
runSubTaskAsynchronouslyWithDictionary: simply extracts the data
passed to it via the dictionary, and adds it to an array, with each entry coupled
to an appropriate xgrid option. For example, the input directory,
which becomes the working directory on the agent, is added to the array by first
inserting the option string @"-in", followed by the path retrieved
from the dictionary. This option is only included if the working directory value
in the dictionary is not an instance of the class NSNull.
|
The entries in the array are set as the arguments of an NSTask,
using the method setArguments:. At the same time, other aspects
of the NSTask, such as the path of the executable to be launched
by the NSTask, which is /usr/bin/xgrid, and
the current directory, which is the directory containing the executable that
xgrid will run.
If you are wondering where I got these paths from, you can take a look at
the top of the method. There I make use of a few path manipulation methods from
the NSString class, like lastPathComponent and stringByDeletingLastPathComponent.
These methods are very useful, and used extensively throughout Photo Industry.
The Cocoa documentation will tell you more.
Returning to the bottom of the method, the NSTask is launched,
and we wait for it to complete, just like last time in the xgridcc
script. We also check the return value upon completion, and log any errors.
Whether the NSTask completes without error or not, we need to
return control to the main thread, informing the delegate that the subtask,
and possibly the whole DistributedTask is finished. To do that
we use the convenience method performSelectorOnMainThread:withObject:waitUntilDone:.
And it really is convenient, believe me.
The method called on the main thread looks like this:
-(void)subTaskDidFinishWithDictionary:(NSDictionary *)subTaskDict {
id identifier = [subTaskDict objectForKey:SubTaskIdKey];
[subTasksRunning removeObject:identifier];
// Notify delegate of subtask completion
if ( delegate &&
[delegate respondsToSelector:
@selector(distributedTask:didFinishSubTaskWithIdentifier:)] )
[delegate distributedTask:self
didFinishSubTaskWithIdentifier:identifier];
// If finished all subtasks, notify delegate
int subTasksRemaining = [subTasksRunning count];
if ( delegate &&
[delegate respondsToSelector:@selector(distributedTaskDidFinishSubTasks:)] &&
subTasksRemaining == 0 )
[delegate performSelector:@selector(distributedTaskDidFinishSubTasks:)
withObject:self afterDelay:0.0];
}
This method keeps track of which subtasks have completed, and informs the
delegate of developments. The subtask is first removed from the subTasksRunning
set. The delegate is then informed that the subtask has completed, and lastly
the delegate is informed that the distributed task has completed, if there are
no subtasks still running.
That concludes the DistributedTask code. This xgrid
wrapper is quite general, and you could very easily reuse it in your own Cocoa
software. I fully expect that in the course of time Apple will publish API that
includes a class very similar to DistributedTask, but in the meantime,
DistributedTask will at least give you a taste of Xgrid in Cocoa,
and hopefully teach you a few tricks with NSTask and multithreading.
Now I have a confession to make: Last time I promised to create a batch image processing app using the library Imagemagick. Well, after spending a few days trying to build something in Cocoa that would link to the Imagemagick libraries, or a command-line Imagemagick tool, and that did not have any dependencies on dynamic libraries, I began to despair. Then I remembered the Python Imaging Library (PIL).
PIL, as the name would suggest, is written in Python. Python is a great high-level scripting language, powerful, and yet extremely simple. It fills a similar role to Perl, but is easier to master, and is Object-Oriented to the core.
The Mac has attracted an enthusiastic clan of Python developers, and they
tend to congregate around the pythonmac.org
and MacPython web sites.
The Mac Python developers do a fantastic job and have managed to convince Apple
to include a Python framework in Panther. You can find it at /System/Library/Frameworks/Python.framework.
Compared to the pain I experienced trying to build a standalone version of Imagemagick, installing PIL was simplicity itself. A quick note to the MacPython mailing list turned up a standalone module built by Bob Ippolito. If you want to install this yourself:
This little exercise demonstrates one of the complications of grid computing,
namely that in general you need to be able to build a standalone version of
your software, with no dependencies on non-standard dylibs or frameworks. You
can check what libraries and frameworks an executable or library makes use of
using the command otool. Just issue this:
otool -L path/to/binary
on the command line.
I won't turn this into a Python lesson, but I do want to show you some parts
of the script that performs the image processing, so you get an idea of how
the agent-side code works, and how beautiful Python code is. The file agentrunscript.py
begins like this:
#!/usr/bin/env python
import sys
import os
import os.path
import string
# Add PIL directory to module search path
workingDir = os.getcwd()
pilPath = os.path.join(workingDir, "PIL")
sys.path.append( pilPath )
After importing some modules, the workingDir variable is set
to the current working directory, using the function getcwd from
the module os. A path is generated from this for the location of
PIL, which will be in the subdirectory PIL in the working directory.
The function join, from the module os.path, is used
to achieve this. The PIL directory path is then added to the search path used
by Python to find modules, using the sys.path.append function.
Python will now be able to find our copy of PIL when we come to use it.
Next, the script reads standard input. Standard input is used to send a list of the filters that should be applied to the images. The filter names are separated by colons.
# Read filters from standard input
filtersString = sys.stdin.read()
filters = string.split(filtersString, ":")
Standard input is read into the variable filtersString, and then
the function split from the module string is used
to split it into a list of filter names, which is put in the variable filters.
The split function takes an optional argument that is the separator
used for splitting the string; this has been set to a colon.
Various modules are then imported from the PIL library, and a list of JPEG files in the working directory created.
# Import PIL
# Filter any jpegs in the working directory
import Image
import ImageFilter
import ImageEnhance
import ImageOps
import glob
jpegFiles = glob.glob("*.[jJ][pP][gG]") + glob.glob("*.[jJ][eE][pP][gG]")
for infile in jpegFiles:
im = Image.open(infile).copy()
if "thumbnail" in filters:
im.thumbnail((128, 128), Image.ANTIALIAS)
if "blur" in filters:
im = im.filter(ImageFilter.BLUR)
if "emboss" in filters:
im = im.filter(ImageFilter.EMBOSS)
...
im.save(infile, "JPEG")
The function glob works much like globbing works in UNIX shell
scripting. The wildcard matches one or more of any character, and the letters
in the square brackets match exactly one letter in the filename.
A loop is used to iterate over the JPEG files; each one is opened, using the
open function from the PIL module Image, and copied.
if branches then check for each filter type in the filters
list. If the filter name is found in the list, the filter is applied. There
are several different ways of applying filters, and each filter tends to have
its own special arguments. You can learn more by reading the PIL
manual. The loop — and the script — end by saving the filtered
image under the original file name.
Hopefully this demonstrates to you that Python is not only an elegantly simple language, but also a powerful one. We have been able to build lists of files, split strings, search lists, and process images with amazing ease. If you need a scripting language for your Xgrid activities, Python comes highly recommended, not least because it is installed on every Mac sporting Panther or higher.
|
The last part of the puzzle is the Cocoa controller class, PIController,
which prepares the Xgrid job, and takes care of the User Interface (UI). We
won't deal with the UI here; the source code is there for all to see. Instead,
we will concentrate only on those parts of the controller that deal with preparing
jobs for Xgrid.
The awakeFromNib method of PIController reads the
filters.plist property list file to initialize the filters available
in Photo Industry.
-(void)awakeFromNib {
...
// Initialize available filters from plist file
NSBundle *bundle = [NSBundle bundleForClass:[self class]];
NSString *plistPath = [bundle pathForResource:@"filters" ofType:@"plist"];
NSData *data = [NSData dataWithContentsOfFile:plistPath];
NSString *errorString;
NSArray *filtersArray = [NSPropertyListSerialization propertyListFromData:data
mutabilityOption:NSPropertyListMutableContainers
format:NULL
errorDescription:&errorString];
NSAssert( nil != filtersArray, @"Could not read property list of filters." );
[self setFilters:filtersArray];
}
The main NSBundle is used to locate the file, and the data is
then read into an instance of NSData. This data is turned into
an array of filter information by the NSPropertyListSerialization
class. There are easier ways to do this, like simply calling the NSArray
method arrayWithContentsOfFile:, but we have taken the long route
because we want our array to be populated with mutable objects. The option NSPropertyListMutableContainers
achieves this objective. The objects need to be mutable, because they will be
used to store whether a filter is on or off, and this can be changed by the
user.
You may be wondering what sort of objects make up filtersArray.
They are simply NSMutableDictionarie's, as you can see by taking
a look in the filters.plist file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
<dict>
<key>filterId</key>
<string>thumbnail</string>
<key>filterName</key>
<string>Thumbnail</string>
<key>isOn</key>
<false/>
</dict>
<dict>
<key>filterId</key>
<string>blur</string>
<key>filterName</key>
<string>Blur</string>
<key>isOn</key>
<false/>
</dict>
As you can see, each dictionary in the array holds key-value pairs for the filter's identity, which is used in Photo Industry and the Python agent script to refer to the filter -- the filter's name, which is what the filter is called in the UI -- and whether it is on or off. All are initially turned off, but the user can turn on filters in the UI, and this requires that the dictionary entry be able to change (i.e., be mutable).
Most of the PIController code that relates to Xgrid can be found
in the method applyFilters:toFilesWithPaths:forOutputDirectoryPath:.
If you write your own Xgrid-enabled Cocoa app, you will likely need a very similar
method, so let's take a good look at how it works.
We will skip anything that is related to the UI, rather than Xgrid. Here is how the method begins.
-(void)applyFilters:(NSArray *)filter toFilesWithPaths:(NSArray *)paths
forOutputDirectoryPath:(NSString *)eventualOutputDirPath {
...
// Create distributed task
NSFileManager *fm = [NSFileManager defaultManager];
DistributedTask *task =
[[[DistributedTask alloc] initWithControllerURLString:@"localhost"] autorelease];
// Store the distributed task. Register for delegate messages.
[self setDistributedTask:task];
[task setDelegate:self];
Here we are introduced to the class NSFileManager. You had better
get used to it, because this guy is going to be your partner for much of the
rest of this article. NSFileManager takes care of the stuff that
commands like mv, cp, rm, and ln
do in a shell. Anytime you need to move, copy, remove, or link a file, you know
who you have to see.
At this point we also create our distributed task, which is our interface
to Xgrid. The PIController is made delegate of the task, and the
DistributedTask is initialized to make use of a controller on the
local machine, localhost. No attempt is made to check whether there
is actually a controller on the local host, and it is not possible to run Photo
Industry using a controller on another computer.
This simplistic approach could easily be improved upon: Controllers advertise
themselves with Rendezvous, so you can go looking for them, and when you find
them, you can query them about things like how many nodes they have available,
using the xgrid command-line tool (see the -node option
in the xgrid man page). Such an approach would lead to a much more
flexible piece of software, but it is too advanced for this introduction. You
can read a good introduction to using Rendezvous in Cocoa by Mike Beam here.
PIController then sets up some directories for the task.
// Set the output directory path.
[self setOutputDirectoryPath:eventualOutputDirPath];
// Setup a temporary directory.
// Also create a directory where the output will end up.
NSString *uniqueString = [[NSProcessInfo processInfo] globallyUniqueString];
NSString *dirName =
[NSString stringWithFormat:@"photoindustry_%@", uniqueString];
[self setTaskTempDirectoryPath:
[NSTemporaryDirectory() stringByAppendingPathComponent:dirName]];
NSString *taskOutputDirPath =
[taskTempDirPath stringByAppendingPathComponent:@"output"];
[fm createDirectoryAtPath:taskTempDirPath attributes:nil];
[fm createDirectoryAtPath:taskOutputDirPath attributes:nil];
It stores the output directory that the user has requested (i.e. eventualOutputDirPath).
This is not the output directory used by DistributedTask, it is
the place where all photos must eventually end up. The DistributedTask
will put its output in a temporary directory, which is created next. An NSProcessInfo
object is used to generate a unique string, which is then used to come up with
a name for the tasks temporary directory, reducing the likelihood that any conflict
will occur. A temporary directory is created for all files used by the DistributedTask.
This is a subdirectory of the directory returned by the Cocoa function NSTemporaryDirectory.
In the task's directory, another subdirectory is created exclusively for output
from the DistributedTask. The NSFileManager is used
to do the directory creation.
Next, a colon-separated list of filters is created, the same one that our Python script received on standard input.
// Create a standard input file for all subtasks
// This is just a colon-separated list of the filter ids of the filters
// that need to be applied
NSMutableArray *filterStringArray = [NSMutableArray arrayWithCapacity:10];
NSEnumerator *en = [filters objectEnumerator];
NSDictionary *filterDict;
while ( filterDict = [en nextObject] ) {
BOOL isOn = [[filterDict objectForKey:@"isOn"] boolValue];
if ( isOn )
[filterStringArray addObject:[filterDict objectForKey:@"filterId"]];
}
NSString *stdInString = [filterStringArray componentsJoinedByString:@":"];
NSString *siPath =
[taskTempDirPath stringByAppendingPathComponent:@"standardinput"];
[stdInString writeToFile:siPath atomically:NO];
We simply iterate over all the filter dictionaries in the filters
array, checking if they are on or off. If on, they are added to our list. Lastly,
this list is written to a file in the task's directory. Later this file will
be set as the standard input of the subtasks in our DistributedTask.
Now the subtasks must be setup.
// Create an input directory for each subtask in the temporary directory.
// Copy photos into the input directories of subtasks.
// Distribute photos as evenly as possible amongst subtasks. If the
// number of photos doesn't exactly divide by the number of subtasks, some
// subtasks are required to process one extra photo.
unsigned baseNumPhotosPerSubTask = [paths count] / NumDistributedSubTasks;
unsigned numSubTasksWithOneExtra = [paths count] % NumDistributedSubTasks;
unsigned subTaskIndex, photoIndex = 0;
for ( subTaskIndex = 0; subTaskIndex < NumDistributedSubTasks;
subTaskIndex++ ) {
NSString *subTaskIndexString =
[NSString stringWithFormat:@"%d", subTaskIndex];
NSString *inputDirPath =
[taskTempDirPath stringByAppendingPathComponent:subTaskIndexString];
[fm createDirectoryAtPath:inputDirPath attributes:nil];
// Copy photos to the input directory for the subtask
unsigned numPhotosThisSubTask = baseNumPhotosPerSubTask;
if ( subTaskIndex < numSubTasksWithOneExtra ) ++numPhotosThisSubTask;
if ( numPhotosThisSubTask == 0 ) continue;
// Don't start subtask for no photos
unsigned subTaskPhotoIndex;
for ( subTaskPhotoIndex = 0; subTaskPhotoIndex < numPhotosThisSubTask;
subTaskPhotoIndex++ ) {
NSString *photoPath = [paths objectAtIndex:photoIndex];
[fm copyPath:photoPath toDirectoryAtPath:inputDirPath];
photoIndex++;
}
Some arithmetic is performed to determine how many photos each subtask should
process. The algorithm simply tries to spread the number of photos as evenly
as possible over the subtasks. If the number of photos does not divide exactly
by the number of subtasks, some tasks are required to take one extra photo.
The number of subtasks is simply a constant, NumDistributedSubTasks,
which is elsewhere set to 4.
A loop over subtasks begins, and an input directory is created for each subtask. The subtask's photos, the paths to which are passed to the method, are then copied to the input directory. Creating a link would be faster, but I found some unnerving behavior whenever a linked file is deleted: the Finder seems to think that all links to a file are deleted when any one of them is. This seems to be an error in Finder, not in the filesystem itself, because the linked file does continue to exist. Nonetheless, I thought it was safer to copy the original files so that should anything go wrong, they would not be lost.
You will not find the method copyPath:toDirectoryAtPath:, which
belongs to NSFileManager, in the Cocoa documentation. That's because
it belongs to a category that I have created in Photo Industry. This is what
it looks like.
@interface NSFileManager (PIControllerExtensions)
-(void)copyPath:(NSString *)path toDirectoryAtPath:(NSString *)inputDir;
@end
@implementation NSFileManager (PIControllerExtensions)
-(void)copyPath:(NSString *)path toDirectoryAtPath:(NSString *)dirPath {
NSString *filePathInDir =
[dirPath stringByAppendingPathComponent:[path lastPathComponent]];
[[NSFileManager defaultManager] copyPath:path toPath:filePathInDir handler:nil];
}
@end
This method is a convenience, because we regularly need to copy files to directories,
and it is a bit annoying to have to keep using the stringByAppendingPathComponent:
method to first setup the new file path, when the file name does not need to
change.
|
We can now finish off the applyFilters:toFilesWithPaths:forOutputDirectoryPath:
method.
// Copy PIL for the input directory
NSBundle *bundle = [NSBundle bundleForClass:[self class]];
NSString *pilPath = [bundle pathForResource:@"PIL" ofType:nil];
NSAssert( nil != pilPath, @"PIL path was nil." );
[fm copyPath:pilPath toDirectoryAtPath:inputDirPath];
// Add subtask to task
NSString *scriptPath =
[bundle pathForResource:@"agentrunscript" ofType:@"py"];
[task addSubTaskWithIdentifier:[NSNumber numberWithInt:subTaskIndex]
launchPath:scriptPath
workingDirectoryPath:inputDirPath
outputDirectoryPath:taskOutputDirPath
standardInputPath:siPath
standardOutputPath:nil];
}
[task launch];
}
Still inside the loop over subtasks, we copy the directory containing PIL
to the input directory of the subtask. The subtask is then added to the distributed
task, using the method discussed earlier, setting the launch path, input and
output directories of the subtask, and the standard input file. We don't need
the standard output, so nil is passed for that. Finally, when all
subtasks have been added to the distributed task, the task is launched.
The progress of the DistributedTask is monitored using its delegate
methods. The PIController was set as the delegate to the task,
so it can implement the methods, and act upon them as required. We will take
a look at the method called when the DistributedTask has finished
running all its subtasks on Xgrid: distributedTaskDidFinishSubTasks:.
The method first checks that the output directory that the user requested
exists. If not, it creates it. If it does exist, and is not a directory, an
NSAssert ensures an exception is raised. In a more robust app,
you would want to handle this better, by informing the user of the problem.
-(void)distributedTaskDidFinishSubTasks:(DistributedTask *)distributedTask {
NSFileManager *fm = [NSFileManager defaultManager];
// Ensure output directory exists, and that it is a directory.
BOOL isDir;
if ( [fm fileExistsAtPath:[self outputDirectoryPath] isDirectory:&isDir] ) {
NSAssert( isDir, @"Output directory path supplied was not a directory." );
}
else {
[fm createDirectoryAtPath:[self outputDirectoryPath] attributes:nil];
}
Next, the filtered photos, which should be in the output directory of the
DistributedTask, are moved to the user's chosen output destination.
// Move task output files to the output directory
NSString *taskOutputDirPath =
[taskTempDirPath stringByAppendingPathComponent:@"output"];
[fm changeCurrentDirectoryPath:taskOutputDirPath];
NSDirectoryEnumerator *en = [fm enumeratorAtPath:taskOutputDirPath];
NSString *relativePath; // Path relative to taskOutputDirPath
while ( relativePath = [en nextObject] ) {
if ( NSOrderedSame !=
[[relativePath pathExtension] caseInsensitiveCompare:@"jpg"] &&
NSOrderedSame !=
[[relativePath pathExtension] caseInsensitiveCompare:@"jpeg"] )
continue;
NSString *fileOutputPath =
[[self outputDirectoryPath] stringByAppendingPathComponent:
[relativePath lastPathComponent]];
if ( [fm fileExistsAtPath:fileOutputPath] )
[fm removeFileAtPath:fileOutputPath handler:nil];
[fm linkPath:relativePath toPath:fileOutputPath handler:nil];
}
An NSDirectoryEnumerator is employed for this operation. An NSDirectoryEnumerator
traverses the contents of a directory, including subdirectories. For each file
found, we check if it is a JPEG, and, if so, link it to the output directory.
Yes, in this case we do use link, instead of copy, because the operation doesn't
pose any threat to the original photos, and is faster.
Finally, we clean up, by removing the entire task temporary directory, which includes all of the files and directories used by the subtasks.
// Remove temporary directory
[fm removeFileAtPath:taskTempDirPath handler:nil];
...
}
We have now covered the parts of Photo Industry that relate directly to Xgrid. If you download the source, you will see there is a lot of other stuff in the application, which could also be useful in your own apps. Photo Industry makes use of drag and drop, for example, and the new Cocoa bindings layer. A good intro to the drag-and-drop techniques can be found on CocoaDevCentral. Bindings are also covered by CocoaDevCentral here, and don't forget that old stalwart Mike Beam, who has recently written on the topic here.
I hope this two-part article has demonstrated the potential of Xgrid in
non-scientific Cocoa applications. We have had to do a lot of work in order
to leverage that potential, using the command-line xgrid tool,
but in all likelihood WWDC will alleviate that in the near future. Hopefully,
this article will be irrelevant after June 28, 2004, when SJ finally unveils
Apple's vision for Xgrid and the future of distributed computation. To be continued
...
Drew McCormack works at the Free University in Amsterdam, and develops the Cocoa shareware Trade Strategist.
Return to the Mac DevCenter
Copyright © 2009 O'Reilly Media, Inc.