oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

An Introduction to Artificial Intelligence
Pages: 1, 2, 3, 4

Image Classification

Now that we've covered the fundamentals and worked through some of the basics, let's consider a problem with widespread applicability: image classification. This is a problem that the postal service faces every day when routing mail, and neural networks excel at these types of tasks. Given that a zip code contains five (or 9) numbers, the problem becomes a matter of producing a digital output value corresponding to a series of symbols on an image.

To show how neural networks can be used to automate mail processing, let's work through the subproblem of identifying individual numeric codes. We'll train the network with some samples of several different fonts that correspond to handwriting samples, and then test the network with a totally different set of data to see how it performs. After all, for AI to be useful, it must be able to perform well even when faced with new information.

In order to supply and train the network with input values, we'll scale a series of images to a reasonable size and provide each of the pixels in the image to the network as a discrete input value. A very simple file format to use for this task is the PNM format. Starting out with files of other formats, we can use the mogrify command to convert to a PNM file very simply. Type which mogrify in Terminal to determine if you already have mogrify installed on your machine (or try to find it using Spotlight). If you don't get a path back to the command, you'll need to install it with Fink or directly from imagemagick's website.

To use mogrify to produce a 30 by 30 text-based file with 16 colors from a binary JPG file, for example, just type mogrify -format pnm -geometry 30x30 -colors 16 +compress image.jpg. Type mogrify -help for a listing of all of the options. Once this command completes, you can view the contents of the resulting PNM file in a text editor and should see a file with the following or similar PNM format.

width height
color color color ...
color color color ...

If you want to view the resulting PNM file as an image and don't already have a capable viewer, the GIMP (also available through Fink) will happily display it for you if you open it via the GIMP's File -> Open command.

Some Guidance

Try starting out with a 30 by 30 image format with 16 colors, and expand from there. You can modify to read the file format and train the network without much effort. Here are three different sets of numeric codes in JPG format that you can mogrify and use. Try using two of the sets for training purposes, and once the network performs well, try testing it with the third set. The number of input units you use is determined by the image dimensions, so for a 30 by 30 image, that's 900 input units (not including the bias). You'll want to use ten output units in an output scheme like that of the 8-3-8 problem. The number of hidden layers and units can vary. You should theoretically be able to use one hidden layer with as few as four units, but go ahead and experiment with other configurations. Also, try varying the learning rate and number of training epochs. For 900 units and a low learning rate, tens of thousands of training epochs is not unreasonable.

I'm going to continue to play with the iSight to learn how we can use it to capture and classify images. In the meantime, check out ALVINN (Autonomous Land Vehicle In a Neural Network)--a project that used neural networks to drive unmanned land vehicles. If you're really hungry, Generation5 also has some good reading on neural networks.

Matthew Russell is a computer scientist from middle Tennessee; and serves Digital Reasoning Systems as the Director of Advanced Technology. Hacking and writing are two activities essential to his renaissance man regimen.

Return to the Mac DevCenter