Published on MacDevCenter (
 See this if you're having trouble printing code examples

The Power of mdfind

by Andy Lester

Tiger introduced Spotlight, a powerful searching mechanism that indexes the contents of all the files on your Mac, almost like magic. Want to find all the documents that mention Perl? Spotlight will do it for you. Just press ⌘+space bar (Command-space bar), and the Spotlight search box pops up. Type in "Perl" and even before you hit the Return key, Spotlight is searching its magic indices for the results.

Note that the documents found are grouped by type, and even include things you might not consider as documents. In this case, the search for Perl on my iBook found mail messages and Keynote presentations, as you'd probably expect, but also Address Book and iCal entries. It's also reassuring that it found mdfind.html, the working version of this article. Sample results from searching my iBook for "perl" are shown below.

figure 1

All the Spotlight-related functionality is based on the idea of metadata, or data about data itself. A Word document contains data, but the fact that the document was created on October 18th, 2001 is metadata. (Metadata is abbreviated "MD" throughout.) Keyword searching on contents of files, not just filenames, is the most obvious way to use Spotlight, but it can also search based on date ranges ("Where's that mail message about OSCON that I wrote last week?") and document types ("Didn't I have a PDF that had Perl testing shortcuts on it?"). The full Spotlight search pane, accessed by clicking the Show All option at the top of the mini list, gives all the details.

In addition to the little blue magnifying glass in the upper-right corner of your desktop, Tiger provides the mdfind and mdls commands. When I discovered them while working on my updates to Mac OS X Tiger In A Nutshell, I fell in love. I had the power of Spotlight available to me from the Unix shell. Experienced Unix users will find mdfind's interface familiar. Mac power users who have never used the Unix under the hood of Tiger are in for a treat.

Starting with mdfind

I've worked on a number of different books for different publishers. Files relating to those projects are scattered around my hard drive. Let's say I'm looking for a copy of an invoice I sent to Apress for my work on Pro Perl Debugging. The easiest way to start is to ask mdfind to do a simple keyword search on the word "invoice."

$ mdfind invoice
/Applications/Microsoft Office 2004/Templates/Business Forms/Invoices
/Users/andy/Desktop/Tape backup.doc
... 102 more filenames ...

What I get back is a list of 110 files that match the word "invoice" somewhere in their contents. The first hit is a directory of templates created by Office, followed by some mail messages about custom work being done for a customer in my day job. Then there's a document proposing a new tape backup server, and then a page from the (unfortunately discontinued) Oracle CD Bookshelf that I've copied to my local hard drive. Digging through 110 filenames, especially if I have to open them to see what's inside, would be tedious.

I'll narrow down the search by adding terms. Since I worked on the book for Apress, I'll add that as a keyword. All words specified in a search term are ANDed together. Since I'm passing multiple words, and I want the Unix shell to pass them as one argument to mdfind, I need to put them in double quotes.

$ mdfind "invoice apress"
/Users/alester/pro-perl-debugging/admin/TR.Invoice.Lester.Foley and McMahon.doc

Now we're down to two hits, and it's clear I want the first file.

Boolean Operators

All words passed in a query string to mdfind are implicitly ANDed together. That is, "invoice apress" means both words must appear. Spotlight allows other Boolean operators as well:

Working with these operators can be tricky. Whitespace is significant when building queries. To get all documents with "invoice" or "o'reilly", I write

$ mdfind "invoice|o'reilly"

with no spaces between the terms. If I want to find all documents with "invoice" but not "apress", it's

$ mdfind "invoice(-apress)"

with no intervening spaces, and parentheses around the term I want to exclude. To get a list of invoices or contracts from O'Reilly, I'd use

$ mdfind "(invoice|contract) o'reilly"

Note that in all these examples that have more than a single word, I'm using double quotes around the search term. This makes the Unix shell pass our multiple words as a single parameter. Otherwise, mdfind uses only the last word, so that

$ mdfind invoice contract

is the same as

$ mdfind contract

It also prevents the shell from intercepting characters that it would use as special, like the parentheses, and passes them unmolested to mdfind. This is especially important if I try to search for "O'Reilly". Without quotes, I get this:

$ mdfind O'Reilly

The angle bracket is the shell telling me "You started a quoted string, and now I'm waiting for you to finish it." It will sit and wait for input until it sees another single quote, or I type Ctrl-C to cancel. The shell has interpreted the single quote in "O'Reilly" as the start of a quoted string. Instead, I want

$ mdfind "O'Reilly"
Mac OS X Tiger in a Nutshell

Related Reading

Mac OS X Tiger in a Nutshell
A Desktop Quick Reference
By Andy Lester, Chris Stone, Chuck Toporek, Jason McIntosh

Narrowing Search Results

Sometimes I know the keywords to search for, but there are too many files on my drive to wade through. That's when I turn to mdfind's handy -onlyin option. It restricts the files returned to files in a specific directory, and the directories below it. This may also speed up searching significantly, since mdfind only has to search a small part of its index for my files.

If I know my invoice is probably somewhere in my home directory's subdirectory called writing, I can use

$ mdfind -onlyin /Users/alester/writing invoice

or, using the tilde character to tell the shell to expand to my home directory, shorten it as

$ mdfind -onlyin ~/writing invoice

I can have multiple -onlyin options, so if I have a folder of stuff I've been meaning to file away, I can include that in my search with:

$ mdfind -onlyin ~/writing -onlyin "/Users/alester/to be filed" invoice

Note that because of the spaces in /Users/alester/to be filed I must put the pathname in double quotes. This also means I can't use the tilde shortcut, because the tilde is a shell character that won't be expanded in quotes.

Unfortunately, mdfind doesn't understand the period to mean "the current directory," but I can use the special shell variable $PWD instead:

$ cd ~/Music/iTunes
$ mdfind -onlyin $PWD "East Bound And Down"

Filtering mdfind's output

Another way to narrow the number of hits is to analyze the output of mdfind. Unix has many programs called filters that take output from one program, analyze or modify it, and create a different set of output.

One of the most common filters is the grep command, which searches a set of input for lines that match a given pattern. In this case, I want grep to show me only lines that have the word "Perl" in them somewhere:

$ mdfind invoice | grep Perl

That didn't return the results I want. I'll try it again with the -i flag to tell grep to do case-insensitive matching.

$ mdfind invoice | grep -i Perl
/Users/alester/pro-perl-debugging/admin/TR.Invoice.Lester.Foley and McMahon.doc

Now I've found the results I wanted. I could have rerun it with grep perl, but then I would have missed results that might have been spelled "Perl".

A downside of this technique is that it only searches the file and directory name. Even though both of these books I worked on were for Apress, if I'd tried to grep on "Apress", I'd have come up empty, because "Apress" doesn't appear in any of the file or directory names.

$ mdfind invoice | grep -i apress

Another useful grep option is -v. It tells grep to show only lines that do not match the expression. For example, if I want to exclude all the results from my IMAP Mail account, I can use

$ mdfind invoice | grep -v IMAP

and I won't see any results where "IMAP" appears in the filename or directory. This can be dangerous, since if I've invoiced the mythical HandiMap company and saved it as /Users/alester/HANDIMAP/invoice-2005.doc, it will be excluded from the results, too.

Another handy Unix tool is the program wc. wc stands for "word count," but with the -l flag it shows me the number of lines in the input passed to it.

$ mdfind invoice | wc -l

Here I find that mdfind returned 110 files matching "invoice". Surely you didn't think I would have counted all 110 file matches at the beginning of this section by hand, did you?

Listing Metadata with mdls

The mdls command is the partner to mdfind. The ls portion of mdls is an analogue to the Unix command ls which lists files in a directory. In this case, mdls lists the metadata attributes associated with a given file.

Here's the metadata for a Word document I created when updating Mac OS X Tiger In A Nutshell.

$ mdls ~/mosxnut3/ch02-addendum.doc
ch02-addendum.doc -------------
    = 2005-12-11 21:45:50 -0600
kMDItemAuthors                 = ("Andy Lester")
    = 2005-09-14 21:26:58 -0500
    = 2005-09-14 21:26:58 -0500
    = ""
    = ("", "",
    = "ch02-addendum.doc"
    = 2005-09-14 21:26:58 -0500
    = 2005-09-14 21:26:58 -0500
kMDItemFSCreatorCode           = 0
kMDItemFSFinderFlags           = 0
kMDItemFSInvisible             = 0
kMDItemFSIsExtensionHidden     = 0
kMDItemFSLabel                 = 0
    = "ch02-addendum.doc"
kMDItemFSNodeCount             = 0
kMDItemFSOwnerGroupID          = 501
kMDItemFSOwnerUserID           = 501
kMDItemFSSize                  = 68382
kMDItemFSTypeCode              = 0
kMDItemID                      = 246252
    = (Tiger, Nutshell, macosx)
    = "Microsoft Word document"
    = 2005-12-11 21:45:47 -0600
    = "Mac OS X Tiger In A Nutshell -- Chapter
    2 -- additional commands"
    = (2005-09-14 21:26:58 -0500, 2005-12-11
    18:00:00 -0600)

The attribute names should be pretty self-explanatory. FS refers to the filesystem, the name for how files are stored on the hard drive, so all the kMDItemFS attributes give information about the files themselves, and not the content. Note that this may be different than information held internally in a specific format.

Each different file format may have specific information unique to that format. The values Tiger, Nutshell, and macosx were entered by me in Microsoft Word in File Properties, which Spotlight then indexed into the kMDItemKeywords attribute. Some metadata is figured out by Spotlight itself, as with the dimensions of a JPEG image.

The attributes for a media file are very different.

$ mdls "05 Power Of Two.m4a"
05 Power Of Two.m4a -------------
kMDItemAlbum                    = "Swamp Ophelia"
    = 2005-11-03 19:08:52 -0600
kMDItemAudioBitRate             = 112024
kMDItemAudioChannelCount        = 2
    = "iTunes v6.0.1, QuickTime 7.0.3"
kMDItemAudioTrackNumber         = 5
kMDItemAuthors                  = ("Indigo Girls")
kMDItemCodecs                   = (AAC)
kMDItemComposer                 = "Saliers, Emily"
    = 2005-11-03 19:08:28 -0600
    = 2005-11-03 19:08:52 -0600
    = "public.mpeg-4-audio"
kMDItemContentTypeTree          = (
    = "05 Power Of Two.m4a"
    = 322.5483900226757
    = 2005-11-03 19:08:52 -0600
    = 2005-11-03 19:08:28 -0600
kMDItemFSCreatorCode            = 1752133483
kMDItemFSFinderFlags            = 0
kMDItemFSInvisible              = 0
kMDItemFSIsExtensionHidden      = 0
kMDItemFSLabel                  = 0
    = "05 Power Of Two.m4a"
kMDItemFSNodeCount              = 0
kMDItemFSOwnerGroupID           = 20
kMDItemFSOwnerUserID            = 501
kMDItemFSSize                   = 4582797
kMDItemFSTypeCode               = 1295270176
kMDItemID                       = 2099725
    = "MPEG-4 Audio File"
    = 2005-11-03 19:08:29 -0600
kMDItemMediaTypes               = (Sound)
kMDItemMusicalGenre             = "Rock"
kMDItemStreamable               = 0
kMDItemTitle                    = "Power Of Two"
kMDItemTotalBitRate             = 112024
    = (2005-11-03 19:08:29 -0600)

If I'm only interested in certain attributes, I can use the -name option:

$ mdls -name kMDItemComposer "11 Space Truckin'.m4a"
11 Space Truckin'.m4a -------------
kMDItemComposer = "Blackmore/Gillan/Glover/Lord/Paice"

Now that I know some attribute names, I can get very precise in how I search. Say I want to find songs composed by Roger Waters. I need to search the kMDItemComposer attribute for "Waters". I'll put the string I'm searching for in double quotes, and then the entire search expression in single quotes.

$ mdfind 'kMDItemComposer = "Waters"'
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Pulse/2-06 Money.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Pulse/2-09 Brain Damage.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Pulse/2-10 Eclipse.m4a

I know that I have more than three songs written by Roger Waters, so I'll rerun the search with wildcards, with an asterisk to mean "any string."

$ mdfind 'kMDItemComposer = "*Waters*"'
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Animals/02 Dogs.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Dark Side Of The Moon/01 Speak To Me _ Breathe.m4a
/Users/andy/Music/iTunes/iTunes Music/Pink Floyd/
    Dark Side Of The Moon/02 On The Run.m4a
... 42 more tracks ...

If I want a case-insensitive search, I can put the letter c outside the double quotes, as in this search to find all forms of "McCartney", regardless of the capitalization.

$ mdfind 'kMDItemComposer = "*mccartney*"c'

Of course, all of this searching for data like composer names depends on the accuracy of the data in the files themselves. Chances are, if you ripped a CD into your iTunes, the data about the music came from an automatic lookup to the Gracenote database, which may or may not have such information entered. If the data's not in the file, then Spotlight can't search against it.

I'm not limited to testing for strings. I can also compare numeric values, with standard arithmetic operators. Maybe I want to find all my music files that were sampled at a bit rate lower than 128K:

$ mdfind 'kMDItemAudioBitRate < 128000'

or all the songs longer than 10 minutes

$ mdfind 'MDItemDurationSeconds > 600'


Remember that mdfind is a file-level utility. It finds files that match, but provides no context for them. It also only provides file-level granularity. For example, since I use Apple's Mail program, which stores individual mail messages as separate files, mdfind returns individual mail messages that match my searches. However, mail programs like Eudora store an entire folder of messages in one file in mbox format. If one message in that box matches a search, mdfind will show the file as a match, but not which message in the file made the match.

I hope you've found this overview of mdfind illuminating. Much of the information has been taken from other articles and comments around the Web, since Apple's documentation on mdfind is so sparse. Here's hoping that a future update to Tiger enhances the documentation.

Appendix A: A Summary of Common Options

From Chapter 2 of Mac OS X Tiger In A Nutshell.

The date and time that a metadata attribute was last changed.
The intended audience of the file.
The authors of the document.
The document's city of origin.
Comments regarding the document.
A list of contacts associated with the document.
The document's creation date.
Last modification date of the document.
The qualified content type of the document, such as com.adobe.pdf for PDF files and for an Apple Advanced Audio Coding (AAC) file.
Contributors to this document.
The copyright owner.
The document's country of origin.
The scope of the document, such as a geographical location or a period of time.
The application that created the document.
A description of the document.
Due date for the item represented by the document.
Duration (in seconds) of the document.
Email addresses associated with this document.
The name of the application (such as Acrobat Distiller) that was responsible for converting the document in its current form.
This contains any Finder comments for the document.
Fonts used in the document.
A headline-style synopsis of the document.
IM addresses/screen names associated with the document.
Special instructions or warnings associated with this document.
Keywords associated with the document.
Describes the kind of document, such as an iCal Event.
Language of the document.
The date and time the document was last opened.
Page count of this document.
The organization that created the document.
Height of the document's page layout in points.
Width of the document's page layout in points.
Phone numbers associated with the document.
Names of projects (other documents, such as an iMovie project) that this document is associated with.
The publisher of the document.
The recipient of the document.
A link to the statement of rights (such as a Creative Commons or old-school copyright license) that govern the use of the document.
Encryption method used on the document.
Rating of the document (as in the iTunes "star" rating).
The document's state or province of origin.
The title.
The version number.
Where the document came from, such as a URI or email address.

Appendix B: Finding Long Songs

Here's a little Perl program to find songs longer than a certain number of minutes, and report on them in a friendly format, in reverse order of length. It uses mdfind to get a list of files for songs over a certain length, and then uses mdls to extract the details, and reports on its findings.

#!/usr/bin/perl -w

use warnings;
use strict;

# Get number of minutes from command line.
my $minutes = shift || 10; # default 10
my $seconds = $minutes * 60;

my @constraints = (
    "kMDItemDurationSeconds > $seconds",
    'kMDItemMediaTypes == "Sound"',
my $mdfind_args = join( " and ", @constraints );

my @filelist = `mdfind '$mdfind_args'`
    or die "You don't have any songs over ",
            "$minutes minutes long!\n";
chomp @filelist; # Remove trailing newlines

my @fileinfo; # List of matching files & stats
for my $filename ( @filelist ) {
    my %fields;

    # Call mdls on the file and scan each line
    foreach ( qx{mdls "$filename"} ) {
        # Find lines with key/value pairs
        if ( /^kMDItem(\w+)\s+=\s+(.*)/ ) {
            # Extract the keys and values
            my ($key,$value) = ($1,$2);

            # Strip surrounding parens & quotes
            $value =~ s/^\(|\)$//g;
            $value =~ s/^"|"$//g;

            # Stash the key/value pair
            $fields{$key} = $value;
    } # for each mdls call
    push( @fileinfo, \%fields );

# Sort in decreasing order of length
@fileinfo = sort {
    } @fileinfo;

# Print the specs for each song
for my $file ( @fileinfo ) {
    printf( qq{%2d:%02d "%s" by %s from "%s"\n},

$ perl longsongs 9
20:34 "2112" by Rush from "2112"
18:36 "Alice's Restaurant Massacree" by Arlo 
    Guthrie from "The Best Of Arlo Guthrie"
9:16 "Between I And Thou" by The Mermen from 
    "A Glorious Lethal Euphoria"
9:05 "Watermelon In Easter Hay" by Frank Zappa
    from "Joe's Garage"
9:03 "Slow Burn" by Silkworm from "Even A Blind
    Chicken Finds A Kernel Of Corn Now And Then"
9:00 "The Load-Out / Stay" by Jackson Browne from
    "Running On Empty"

Andy Lester is a QA & Release Manager for Socialtext. He is also in charge of PR for The Perl Foundation and maintains over 25 modules on CPAN.

Return to the Mac DevCenter

Copyright © 2009 O'Reilly Media, Inc.