macdevcenter.com
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Cleaning iPhoto

by brian d foy
02/27/2004

It seems that almost every other soldier in Iraq has a digital camera and a CD burner. So not only do I have my own photos, but I have a collection of CDs of photos from other soldiers, some of whom I know and some I don't. These CDs have been copied and passed around so much that we do not know who took some of the photos or which unit they were in. Indeed, some pictures show up on different CDs as people merged photo collections to create new ones.

Organizing these thousands of pictures by hand was a daunting task, especially since iPhoto is so slow, so I came up with some scripts to do it for me. The scripts are only a bit faster, but at least I can leave them running while I go do something else. I ended up using a mix of AppleScript and Perl's Mac::Glue module.

Upgrading to iPhoto 4

I left for the Middle East before Apple released iPhoto 4, skipping a version 3 to release something reportedly faster than its predecessor, iPhoto 2.

I had to wait for awhile to get iPhoto 4 since it comes with iLife 4, which includes iTunes 4, available separately for free, and two applications that I did not want, iDVD and iMovie, making the entire package more than I could download even if Apple offered that as an option. I had to buy the entire package for $49, which Apple shipped to me on a CD and DVD.

Related Reading

iPhoto 2: The Missing Manual
By David Pogue, Joseph Schorr, Derrick Story

The upgrade was not much help -- certainly not $49 worth of help since even iPhoto 4 is sluggish on my G4 PowerBook. I still have hope that iPhoto will one day be usable, which is why I keep using it.

Using Multiple Libraries

I could only tolerate the previous versions of iPhoto if I kept the library size under a couple hundred photos, even though I have thousands of pictures. The file ~/Library/Preferences/com.apple.iPhoto.plist tells iPhoto which directory holds the photo library. I can change that myself, if I like, or I can use a program like iPhoto Library Manager or iPhoto Buddy.

If I use multiple libraries, I can keep a small number of pictures in each library so I can burn an entire library onto one CD. So far, iPhoto does not have a way to archive large libraries across multiple CDs. iPhoto tends to be more responsive with smaller libraries only, which was a big problem with iPhoto 2. And although iPhoto 4 improves on this, it still has a way to go.

Finding Thumbnails

Some of the CDs that we passed around had thumbnails of the full-sized photos, as part of some HTML export feature of some photo software. I wanted to get rid of those.

When I delete a photo with any of these, I do not really remove it from the iPhoto -- iPhoto just moves it to the special library named Trash, so I can still recover them if I make a mistake. When I really want to get rid of the photos I empty the Trash with the "Empty Trash" item in the File menu. I keep a backup copy of all my iPhoto libraries in case I make a really big mistake (it's only happened a couple of times).

I wrote a script to go through each photo in the Photo Library and check each photo's dimensions. If the photo is smaller than a certain size, I remove it. I first tried this with AppleScript.

To start, I get the count of photos in the Photo Library, then start processing them in reverse by their order in the library. If I start at the beginning, then remove a photo, all of the photos after it move down a number and I end up skipping some photos, and when I get to the end, I will try to access a photo number that no longer exists.

For each photo index, I check the height and width of the photo. I figure the image is a thumbnail if either of those dimensions are less than 200 pixels, an arbitrary number I chose as the threshold. If either of those tests are true, I remove the image. I have to remove the image from the Photo Library album itself, because if I remove it from an album that I created, it is still in the Photo Library.

When I run any of the scripts in this article, I should not interact with iPhoto -- not even to look at photos in another album. These scripts pretend to be me doing the same thing, and when I manually play with iPhoto while the scripts do their work, the script gets confused.


  tell application "iPhoto"
    set myAlbum to photo library album
    repeat with myIndex from (count of photos in photo library album) to 1 by -1
      set thisPhoto to photo myIndex of photo library album
      if width of thisPhoto < 200 or height of thisPhoto < 200 then
        remove thisPhoto from photo library album
      end if
    end repeat
  end tell

For my library of approximately 6,000 photos, this script takes all night to run. I do not automatically empty the Trash (although iPhoto has an AppleScript command for that) so that I can manually inspect it to check if I accidentally deleted something I want to keep.

I tried a Perl version of this AppleScript, using the Mac::Glue module, to check if it would be any faster. I translated it as closely as I could to make a fair comparison, and this script finished the same library (which I had restored from a backup copy) in about an hour. That is quite the speed-up! Chris Nandor and Simon Cozens introduced Mac::Glue in an earlier article.


  #!/usr/bin/perl
  use warnings;
  use strict;
  
  use Mac::Glue;
  
  my $iphoto  = Mac::Glue->new( "iPhoto" );
  
  my $album   = "photo library album";
  my $library = $iphoto->prop( $album );
  
  my $count   = $library->prop( "photos" )->count;
  print "My count is $count\n";
  
  for( my $index = $count; $index > 0; $index-- )
    {
    my $photo  = $library->obj( photo => $index );
    my $width  = $photo->prop( "width" )->get;
    my $height = $photo->prop( "height" )->get;
    
    next unless( defined $width and defined $height );
    if( $width < 200 or $height < 200 )
      {
      print "\t--->deleting $index: w $width h $height\n";
      $photo->remove();
      }
    }

Now all of the thumbnails in my library should be in the Trash, where I can recover them if I like. Once I inspect them and ensure I am not getting rid of anything I want to keep, I empty the Trash.

Getting Rid of GIFs

Not only does a Perl version of the equivalent AppleScript run a lot faster, I get to use all of the power of Perl to decide what I want to do in between the interactions with iPhoto.

When I imported some photos, I found some GIF images that look like they were probably background images for the web pages their photo manager created when exporting the photos. I also found one CD that had GIF slides from a PowerPoint presentation, which I thought was odd.

I start the same Perl program I used earlier, but I changed the test to check the image's file name (an image also has a title, a name, and a path). The "image_filename" property includes the full filename with its extension, and if that ends in .gif, no matter the case (so .GIF matches too), I remove the image. Again, these images end up in the Trash, and I can manually inspect them before I finally get rid of them.


  #!/usr/bin/perl
  use warnings;
  use strict;
  
  use Mac::Glue;
  
  my $iphoto  = Mac::Glue->new( "iPhoto" );
  
  my $album   = "photo library album";
  my $library = $iphoto->prop( $album );
  
  my $count   = $library->prop( "photos" )->count;
  print "My count is $count\n";
  
  for( my $index = $count; $index > 0; $index-- )
    {
    my $photo  = $library->obj( photo => $index );
    my $name   = $photo->prop( "image_filename" )->get;
    print "$index: $name\n";
    
    next unless defined $name;
    if( $name =~ /\.gif$/i )
      {
      print "\t--->deleting $name\n";
      $photo->remove();
      }
    }

I could also do this with AppleScript if I boned up on its string handling features. I would much rather use my AppleScript In a Nutshell book printed on real paper than make my way through Apple's PDF version of its more than 300-page AppleScript Language Guide.

Finding Duplicates

iPhoto can generally keep me from importing duplicate images by popping up a dialog that shows me an image already in the Photo Library and the one I am importing if it thinks they are the same. And most of the time iPhoto is right but not always.

Instead of telling iPhoto to automatically not import duplicates, I tell it to import all files, then handle it myself later. This way I can walk away from the computer while iPhoto imports a CD full of pictures, which takes a long time, without worrying about it being held up with a dialog to which I do not respond.

I found removing duplicate images a bit more tricky than my previous clean-ups. I assume that "duplicate" means the exact same image file, not just the same photo with a different name, different comments, or other different meta-information that someone might have changed.

I use the MD5 digest to determine which photos are the same. This digest is a digital fingerprint of the file, and each file should have a unique fingerprint. For each photo, I get the image_path property, which is the full path to the image. For each path, I get the MD5 digest by reading the file directly in the Perl program without going through iPhoto. If that particular digest does not exist in my %digests hash, I add it. If the digest does exist, I must have seen the exact same file before, which means the one I am currently processing is a duplicate, so I remove it.


  #!/usr/bin/perl
  use warnings;
  use strict;
  
  use Digest::MD5;
  use Mac::Glue;
  
  my $iphoto  = Mac::Glue->new( "iPhoto" );
  
  my $album   = "photo library album";
  my $library = $iphoto->prop( $album );
  
  my $count   = $library->prop( "photos" )->count;
  print "My count is $count\n";
  
  my $md5     = Digest::MD5->new();
  my %digests = ();
  
  PHOTO: for( my $index = $count; $index > 0; $index-- )
    {
    my $photo  = $library->obj( photo => $index );
    my $path   = $photo->prop( "image_path" )->get;
    
    next unless defined $path;
    
    open my($fh), $path or do { warn "$path: $!\n"; next PHOTO };
    $md5->addfile( $fh );
    
    my $digest = $md5->hexdigest;
    
    if( exists $digests{ $digest } )
      {
      print "$digests{ $digest }\n  -->$path\n";
      $photo->remove;
      }
    else
      {
      print "$index->$path: $digest\n";
      $digests{ $digest } = $path;
      }
      
    $md5->reset;
    }

I have to use the image_path property because it is the only property that should always point to the right file. If I import two files with the same name (and they have the same modification year and month, because iPhoto sorts them into directories based on that), the new file's name gets changed so it does not overwrite the existing one. However, iPhoto retains their original file name, without the extension, for the title, which is the text I see with the photo when I select "View Title" from iPhoto's View menu.

Conclusion

iPhoto 4 made a lot of improvements, but it is still slow enough that I manage it with a lot of batch processing through either AppleScript or an equivalent Perl script. The Mac::Glue can have a speed advantage over AppleScript, and I can use all of the power of Perl. Either way, once I script a task, I can walk away from my computer while iPhoto slogs along.

brian d foy is a Perl trainer for Stonehenge Consulting Services and is the publisher of The Perl Review.


Return to MacDevCenter.com.