MacDevCenter    
 Published on MacDevCenter (http://www.macdevcenter.com/)
 See this if you're having trouble printing code examples


The Power of mdfind

by Andy Lester
01/04/2006

Tiger introduced Spotlight, a powerful searching mechanism that indexes the contents of all the files on your Mac, almost like magic. Want to find all the documents that mention Perl? Spotlight will do it for you. Just press ⌘+space bar (Command-space bar), and the Spotlight search box pops up. Type in "Perl" and even before you hit the Return key, Spotlight is searching its magic indices for the results.

Note that the documents found are grouped by type, and even include things you might not consider as documents. In this case, the search for Perl on my iBook found mail messages and Keynote presentations, as you'd probably expect, but also Address Book and iCal entries. It's also reassuring that it found mdfind.html, the working version of this article. Sample results from searching my iBook for "perl" are shown below.

figure 1

All the Spotlight-related functionality is based on the idea of metadata, or data about data itself. A Word document contains data, but the fact that the document was created on October 18th, 2001 is metadata. (Metadata is abbreviated "MD" throughout.) Keyword searching on contents of files, not just filenames, is the most obvious way to use Spotlight, but it can also search based on date ranges ("Where's that mail message about OSCON that I wrote last week?") and document types ("Didn't I have a PDF that had Perl testing shortcuts on it?"). The full Spotlight search pane, accessed by clicking the Show All option at the top of the mini list, gives all the details.

In addition to the little blue magnifying glass in the upper-right corner of your desktop, Tiger provides the mdfind and mdls commands. When I discovered them while working on my updates to Mac OS X Tiger In A Nutshell, I fell in love. I had the power of Spotlight available to me from the Unix shell. Experienced Unix users will find mdfind's interface familiar. Mac power users who have never used the Unix under the hood of Tiger are in for a treat.

Starting with mdfind

I've worked on a number of different books for different publishers. Files relating to those projects are scattered around my hard drive. Let's say I'm looking for a copy of an invoice I sent to Apress for my work on Pro Perl Debugging. The easiest way to start is to ask mdfind to do a simple keyword search on the word "invoice."

$ mdfind invoice
/Applications/Microsoft Office 2004/Templates/Business Forms/Invoices
/Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45070.emlx
/Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45071.emlx
/Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45068.emlx
/Users/andy/Library/Mail/IMAP-andy@mail/KMSD/Messages/45069.emlx
/Users/andy/Desktop/Tape backup.doc
/Users/books/oracle-cd/webapp/ch09_05.htm
... 102 more filenames ...

What I get back is a list of 110 files that match the word "invoice" somewhere in their contents. The first hit is a directory of templates created by Office, followed by some mail messages about custom work being done for a customer in my day job. Then there's a document proposing a new tape backup server, and then a page from the (unfortunately discontinued) Oracle CD Bookshelf that I've copied to my local hard drive. Digging through 110 filenames, especially if I have to open them to see what's inside, would be tedious.

I'll narrow down the search by adding terms. Since I worked on the book for Apress, I'll add that as a keyword. All words specified in a search term are ANDed together. Since I'm passing multiple words, and I want the Unix shell to pass them as one argument to mdfind, I need to put them in double quotes.

$ mdfind "invoice apress"
/Users/alester/pro-perl-debugging/admin/TR.Invoice.Lester.Foley and McMahon.doc
/Users/alester/pro-perl/admin/TR.Invoice.Lester.Wainwright.doc

Now we're down to two hits, and it's clear I want the first file.

Boolean Operators

All words passed in a query string to mdfind are implicitly ANDed together. That is, "invoice apress" means both words must appear. Spotlight allows other Boolean operators as well:

Working with these operators can be tricky. Whitespace is significant when building queries. To get all documents with "invoice" or "o'reilly", I write

$ mdfind "invoice|o'reilly"

with no spaces between the terms. If I want to find all documents with "invoice" but not "apress", it's

$ mdfind "invoice(-apress)"

with no intervening spaces, and parentheses around the term I want to exclude. To get a list of invoices or contracts from O'Reilly, I'd use

$ mdfind "(invoice|contract) o'reilly"

Note that in all these examples that have more than a single word, I'm using double quotes around the search term. This makes the Unix shell pass our multiple words as a single parameter. Otherwise, mdfind uses only the last word, so that

$ mdfind invoice contract

is the same as

$ mdfind contract

It also prevents the shell from intercepting characters that it would use as special, like the parentheses, and passes them unmolested to mdfind. This is especially important if I try to search for "O'Reilly". Without quotes, I get this:

$ mdfind O'Reilly
>

The angle bracket is the shell telling me "You started a quoted string, and now I'm waiting for you to finish it." It will sit and wait for input until it sees another single quote, or I type Ctrl-C to cancel. The shell has interpreted the single quote in "O'Reilly" as the start of a quoted string. Instead, I want

$ mdfind "O'Reilly"
Mac OS X Tiger in a Nutshell

Related Reading

Mac OS X Tiger in a Nutshell
A Desktop Quick Reference
By Andy Lester, Chris Stone, Chuck Toporek, Jason McIntosh

Narrowing Search Results

Sometimes I know the keywords to search for, but there are too many files on my drive to wade through. That's when I turn to mdfind's handy -onlyin option. It restricts the files returned to files in a specific directory, and the directories below it. This may also speed up searching significantly, since mdfind only has to search a small part of its index for my files.

If I know my invoice is probably somewhere in my home directory's subdirectory called writing, I can use

$ mdfind -onlyin /Users/alester/writing invoice

or, using the tilde character to tell the shell to expand to my home directory, shorten it as

$ mdfind -onlyin ~/writing invoice

I can have multiple -onlyin options, so if I have a folder of stuff I've been meaning to file away, I can include that in my search with:

$ mdfind -onlyin ~/writing -onlyin "/Users/alester/to be filed" invoice

Note that because of the spaces in /Users/alester/to be filed I must put the pathname in double quotes. This also means I can't use the tilde shortcut, because the tilde is a shell character that won't be expanded in quotes.

Unfortunately, mdfind doesn't understand the period to mean "the current directory," but I can use the special shell variable $PWD instead:

$ cd ~/Music/iTunes
$ mdfind -onlyin $PWD "East Bound And Down"

Filtering mdfind's output

Another way to narrow the number of hits is to analyze the output of mdfind. Unix has many programs called filters that take output from one program, analyze or modify it, and create a different set of output.

One of the most common filters is the grep command, which searches a set of input for lines that match a given pattern. In this case, I want grep to show me only lines that have the word "Perl" in them somewhere:

$ mdfind invoice | grep Perl
$

That didn't return the results I want. I'll try it again with the -i flag to tell grep to do case-insensitive matching.

$ mdfind invoice | grep -i Perl
/Users/alester/pro-perl-debugging/admin/TR.Invoice.Lester.Foley and McMahon.doc
/Users/alester/pro-perl/admin/TR.Invoice.Lester.Wainwright.doc

Now I've found the results I wanted. I could have rerun it with grep perl, but then I would have missed results that might have been spelled "Perl".

A downside of this technique is that it only searches the file and directory name. Even though both of these books I worked on were for Apress, if I'd tried to grep on "Apress", I'd have come up empty, because "Apress" doesn't appear in any of the file or directory names.

$ mdfind invoice | grep -i apress
$

Another useful grep option is -v. It tells grep to show only lines that do not match the expression. For example, if I want to exclude all the results from my IMAP Mail account, I can use

$ mdfind invoice | grep -v IMAP

and I won't see any results where "IMAP" appears in the filename or directory. This can be dangerous, since if I've invoiced the mythical HandiMap company and saved it as /Users/alester/HANDIMAP/invoice-2005.doc, it will be excluded from the results, too.

Another handy Unix tool is the program wc. wc stands for "word count," but with the -l flag it shows me the number of lines in the input passed to it.

$ mdfind invoice | wc -l
     110

Here I find that mdfind returned 110 files matching "invoice". Surely you didn't think I would have counted all 110 file matches at the beginning of this section by hand, did you?

Listing Metadata with mdls

The mdls command is the partner to mdfind. The ls portion of mdls is an analogue to the Unix command ls which lists files in a directory. In this case, mdls lists the metadata attributes associated with a given file.

Here's the metadata for a Word document I created when updating Mac OS X Tiger In A Nutshell.

$ mdls ~/mosxnut3/ch02-addendum.doc
ch02-addendum.doc -------------
kMDItemAttributeChangeDate
    = 2005-12-11 21:45:50 -0600
kMDItemAuthors                 = ("Andy Lester")
kMDItemContentCreationDate
    = 2005-09-14 21:26:58 -0500
kMDItemContentModificationDate
    = 2005-09-14 21:26:58 -0500
kMDItemContentType
    = "com.microsoft.word.doc"
kMDItemContentTypeTree
    = ("com.microsoft.word.doc", "public.data",
    "public.item")
kMDItemDisplayName
    = "ch02-addendum.doc"
kMDItemFSContentChangeDate
    = 2005-09-14 21:26:58 -0500
kMDItemFSCreationDate
    = 2005-09-14 21:26:58 -0500
kMDItemFSCreatorCode           = 0
kMDItemFSFinderFlags           = 0
kMDItemFSInvisible             = 0
kMDItemFSIsExtensionHidden     = 0
kMDItemFSLabel                 = 0
kMDItemFSName
    = "ch02-addendum.doc"
kMDItemFSNodeCount             = 0
kMDItemFSOwnerGroupID          = 501
kMDItemFSOwnerUserID           = 501
kMDItemFSSize                  = 68382
kMDItemFSTypeCode              = 0
kMDItemID                      = 246252
kMDItemKeywords
    = (Tiger, Nutshell, macosx)
kMDItemKind
    = "Microsoft Word document"
kMDItemLastUsedDate
    = 2005-12-11 21:45:47 -0600
kMDItemTitle
    = "Mac OS X Tiger In A Nutshell -- Chapter
    2 -- additional commands"
kMDItemUsedDates
    = (2005-09-14 21:26:58 -0500, 2005-12-11
    18:00:00 -0600)

The attribute names should be pretty self-explanatory. FS refers to the filesystem, the name for how files are stored on the hard drive, so all the kMDItemFS attributes give information about the files themselves, and not the content. Note that this may be different than information held internally in a specific format.

Each different file format may have specific information unique to that format. The values Tiger, Nutshell, and macosx were entered by me in Microsoft Word in File Properties, which Spotlight then indexed into the kMDItemKeywords attribute. Some metadata is figured out by Spotlight itself, as with the dimensions of a JPEG image.

The attributes for a media file are very different.

$ mdls "05 Power Of Two.m4a"
05 Power Of Two.m4a -------------
kMDItemAlbum                    = "Swamp Ophelia"
kMDItemAttributeChangeDate
    = 2005-11-03 19:08:52 -0600
kMDItemAudioBitRate             = 112024
kMDItemAudioChannelCount        = 2
kMDItemAudioEncodingApplication
    = "iTunes v6.0.1, QuickTime 7.0.3"
kMDItemAudioTrackNumber         = 5
kMDItemAuthors                  = ("Indigo Girls")
kMDItemCodecs                   = (AAC)
kMDItemComposer                 = "Saliers, Emily"
kMDItemContentCreationDate
    = 2005-11-03 19:08:28 -0600
kMDItemContentModificationDate
    = 2005-11-03 19:08:52 -0600
kMDItemContentType
    = "public.mpeg-4-audio"
kMDItemContentTypeTree          = (
    "public.mpeg-4-audio",
    "public.audio",
    "public.audiovisual-content",
    "public.data",
    "public.item",
    "public.content"
)
kMDItemDisplayName
    = "05 Power Of Two.m4a"
kMDItemDurationSeconds
    = 322.5483900226757
kMDItemFSContentChangeDate
    = 2005-11-03 19:08:52 -0600
kMDItemFSCreationDate
    = 2005-11-03 19:08:28 -0600
kMDItemFSCreatorCode            = 1752133483
kMDItemFSFinderFlags            = 0
kMDItemFSInvisible              = 0
kMDItemFSIsExtensionHidden      = 0
kMDItemFSLabel                  = 0
kMDItemFSName
    = "05 Power Of Two.m4a"
kMDItemFSNodeCount              = 0
kMDItemFSOwnerGroupID           = 20
kMDItemFSOwnerUserID            = 501
kMDItemFSSize                   = 4582797
kMDItemFSTypeCode               = 1295270176
kMDItemID                       = 2099725
kMDItemKind
    = "MPEG-4 Audio File"
kMDItemLastUsedDate
    = 2005-11-03 19:08:29 -0600
kMDItemMediaTypes               = (Sound)
kMDItemMusicalGenre             = "Rock"
kMDItemStreamable               = 0
kMDItemTitle                    = "Power Of Two"
kMDItemTotalBitRate             = 112024
kMDItemUsedDates
    = (2005-11-03 19:08:29 -0600)

If I'm only interested in certain attributes, I can use the -name option:

$ mdls -name kMDItemComposer "11 Space Truckin'.m4a"
11 Space Truckin'.m4a -------------
kMDItemComposer = "Blackmore/Gillan/Glover/Lord/Paice"

Now that I know some attribute names, I can get very precise in how I search. Say I want to find songs composed by Roger Waters. I need to search the kMDItemComposer attribute for "Waters". I'll put the string I'm searching for in double quotes, and then the entire search expression in single quotes.

$ mdfind 'kMDItemComposer = "Waters"'
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Pulse/2-06 Money.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Pulse/2-09 Brain Damage.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Pulse/2-10 Eclipse.m4a

I know that I have more than three songs written by Roger Waters, so I'll rerun the search with wildcards, with an asterisk to mean "any string."

$ mdfind 'kMDItemComposer = "*Waters*"'
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Animals/02 Dogs.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Dark Side Of The Moon/01 Speak To Me _ Breathe.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Dark Side Of The Moon/02 On The Run.m4a
... 42 more tracks ...

If I want a case-insensitive search, I can put the letter c outside the double quotes, as in this search to find all forms of "McCartney", regardless of the capitalization.

$ mdfind 'kMDItemComposer = "*mccartney*"c'

Of course, all of this searching for data like composer names depends on the accuracy of the data in the files themselves. Chances are, if you ripped a CD into your iTunes, the data about the music came from an automatic lookup to the Gracenote database, which may or may not have such information entered. If the data's not in the file, then Spotlight can't search against it.

I'm not limited to testing for strings. I can also compare numeric values, with standard arithmetic operators. Maybe I want to find all my music files that were sampled at a bit rate lower than 128K:

$ mdfind 'kMDItemAudioBitRate < 128000'

or all the songs longer than 10 minutes

$ mdfind 'MDItemDurationSeconds > 600'

Summary

Remember that mdfind is a file-level utility. It finds files that match, but provides no context for them. It also only provides file-level granularity. For example, since I use Apple's Mail program, which stores individual mail messages as separate files, mdfind returns individual mail messages that match my searches. However, mail programs like Eudora store an entire folder of messages in one file in mbox format. If one message in that box matches a search, mdfind will show the file as a match, but not which message in the file made the match.

I hope you've found this overview of mdfind illuminating. Much of the information has been taken from other articles and comments around the Web, since Apple's documentation on mdfind is so sparse. Here's hoping that a future update to Tiger enhances the documentation.

Appendix A: A Summary of Common Options

From Chapter 2 of Mac OS X Tiger In A Nutshell.

MDItemAttributeChangeDate
The date and time that a metadata attribute was last changed.
MDItemAudiences
The intended audience of the file.
MDItemAuthors
The authors of the document.
MDItemCity
The document's city of origin.
MDItemComment
Comments regarding the document.
MDItemContactKeywords
A list of contacts associated with the document.
MDItemContentCreationDate
The document's creation date.
MDItemContentModificationDate
Last modification date of the document.
MDItemContentType
The qualified content type of the document, such as com.adobe.pdf for PDF files and com.apple.protected-mpeg-4-audio for an Apple Advanced Audio Coding (AAC) file.
MDItemContributors
Contributors to this document.
MDItemCopyright
The copyright owner.
MDItemCountry
The document's country of origin.
MDItemCoverage
The scope of the document, such as a geographical location or a period of time.
MDItemCreator
The application that created the document.
MDItemDescription
A description of the document.
MDItemDueDate
Due date for the item represented by the document.
MDItemDurationSeconds
Duration (in seconds) of the document.
MDItemEmailAddresses
Email addresses associated with this document.
MDItemEncodingApplications
The name of the application (such as Acrobat Distiller) that was responsible for converting the document in its current form.
MDItemFinderComment
This contains any Finder comments for the document.
MDItemFonts
Fonts used in the document.
MDItemHeadline
A headline-style synopsis of the document.
MDItemInstantMessageAddresses
IM addresses/screen names associated with the document.
MDItemInstructions
Special instructions or warnings associated with this document.
MDItemKeywords
Keywords associated with the document.
MDItemKind
Describes the kind of document, such as an iCal Event.
MDItemLanguages
Language of the document.
MDItemLastUsedDate
The date and time the document was last opened.
MDItemNumberOfPages
Page count of this document.
MDItemOrganizations
The organization that created the document.
MDItemPageHeight
Height of the document's page layout in points.
MDItemPageWidth
Width of the document's page layout in points.
MDItemPhoneNumbers
Phone numbers associated with the document.
MDItemProjects
Names of projects (other documents, such as an iMovie project) that this document is associated with.
MDItemPublishers
The publisher of the document.
MDItemRecipients
The recipient of the document.
MDItemRights
A link to the statement of rights (such as a Creative Commons or old-school copyright license) that govern the use of the document.
MDItemSecurityMethod
Encryption method used on the document.
MDItemStarRating
Rating of the document (as in the iTunes "star" rating).
MDItemStateOrProvince
The document's state or province of origin.
MDItemTitle
The title.
MDItemVersion
The version number.
MDItemWhereFroms
Where the document came from, such as a URI or email address.

Appendix B: Finding Long Songs

Here's a little Perl program to find songs longer than a certain number of minutes, and report on them in a friendly format, in reverse order of length. It uses mdfind to get a list of files for songs over a certain length, and then uses mdls to extract the details, and reports on its findings.

#!/usr/bin/perl -w

use warnings;
use strict;

# Get number of minutes from command line.
my $minutes = shift || 10; # default 10
my $seconds = $minutes * 60;

my @constraints = (
    "kMDItemDurationSeconds > $seconds",
    'kMDItemMediaTypes == "Sound"',
);
my $mdfind_args = join( " and ", @constraints );

my @filelist = `mdfind '$mdfind_args'`
    or die "You don't have any songs over ",
            "$minutes minutes long!\n";
chomp @filelist; # Remove trailing newlines

my @fileinfo; # List of matching files & stats
for my $filename ( @filelist ) {
    my %fields;

    # Call mdls on the file and scan each line
    foreach ( qx{mdls "$filename"} ) {
        # Find lines with key/value pairs
        if ( /^kMDItem(\w+)\s+=\s+(.*)/ ) {
            # Extract the keys and values
            my ($key,$value) = ($1,$2);

            # Strip surrounding parens & quotes
            $value =~ s/^\(|\)$//g;
            $value =~ s/^"|"$//g;

            # Stash the key/value pair
            $fields{$key} = $value;
        }
    } # for each mdls call
    push( @fileinfo, \%fields );
}

# Sort in decreasing order of length
@fileinfo = sort {
    $b->{DurationSeconds}
        <=>
    $a->{DurationSeconds}
    } @fileinfo;

# Print the specs for each song
for my $file ( @fileinfo ) {
    printf( qq{%2d:%02d "%s" by %s from "%s"\n},
        $file->{DurationSeconds}/60,
        $file->{DurationSeconds}%60,
        $file->{Title},
        $file->{Authors},
        $file->{Album},
    );
}

$ perl longsongs 9
20:34 "2112" by Rush from "2112"
18:36 "Alice's Restaurant Massacree" by Arlo 
    Guthrie from "The Best Of Arlo Guthrie"
...
9:16 "Between I And Thou" by The Mermen from 
    "A Glorious Lethal Euphoria"
9:05 "Watermelon In Easter Hay" by Frank Zappa
    from "Joe's Garage"
9:03 "Slow Burn" by Silkworm from "Even A Blind
    Chicken Finds A Kernel Of Corn Now And Then"
9:00 "The Load-Out / Stay" by Jackson Browne from
    "Running On Empty"

Andy Lester is a QA & Release Manager for Socialtext. He is also in charge of PR for The Perl Foundation and maintains over 25 modules on CPAN.


Return to the Mac DevCenter

Copyright © 2009 O'Reilly Media, Inc.