oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Are You Talking to Me? Speech on Mac OS X
Pages: 1, 2, 3

Can Speech Adapt to My Mistakes?

For Newcomers

Surprisingly, yes. Indeed, to a certain extent, Speech will try to understand your command even if you do not get it right immediately. For example, "Get my mails" and "Get my mail" will work the same way.

However, you should not expect Speech to understand sentences that are too different from what the developer intended. If you think that a command is so unnatural that you won't be able to learn it, you may want to create a custom command that will be more natural to you.

For the Cutting-Edge Addicts

If you're ready to explore the latest developments of the Speech technology, you can turn on Panther's Semantic Inference feature. Under this strange-sounding name hides a technology that allows Speech to understand what you say, even if you do not speak the predefined command.

When this is turned on, you can replace "What time is it?" with "What is the time?", "Tell me the time," or even "How late is it?"

Since this technology is still at its early stages of development, Apple chose to turn it off by default. Its accuracy may not be perfect (yet) and it may slow the speech-recognition engine down a bit. In my experience however, all worked perfectly well, so I would encourage you to give it a try.

To do so, follow these steps:

  1. Open the Speech Preferences pane by saying "Open the Speech Preferences."
  2. Go to the Speech Recognition tab.
  3. Select the Commands sub-tab.
  4. Highlight "Global speakable items" and click on Configure.
  5. In the sheet that appears, uncheck the box to turn the feature on (I know, I know...).

To test it, read the sentences suggested by the activation sheet and be amazed.

Going One Step Further

Now that you have discovered the joy of Speech, it's time to go one step further and learn how to almost completely get rid of your keyboard and mouse.

Front Window and Menu Bar Control

For now, you may have noticed that many commands are still out of your reach, including menu items, toolbar buttons, etc. The good news is that you can control them with Speech too, making your keyboard and mouse almost obsolete.

In order to turn this option on, follow these steps:

  1. In the Universal Access preferences pane, click on "Enable access for assistive devices."
  2. In the Speech preferences pane, click on Commands.
  3. Select Front Window and Menu Bar.

Now a whole new world is open to you. Try to say the following commands to show or hide the volume in the menu bar:

  1. Switch to System Preferences.
  2. Show all.
  3. Sound.
  4. Show volume in menu bar.

This gives you a lot of power over your applications and dialog boxes. Unfortunately, some nonstandard controls will not work with this method. Also, you probably will not be able to pick items in complex lists by using Speech. However, most of the functionality of most applications will be available via voice commands.

Even more powerful and more universal is the menu bar. Indeed, you can control it by voice. Since almost all menus are standard, you can without any issue access most of the menu commands from your applications.

To shut your Mac down, you would say:

  1. Switch to Finder.
  2. Apple menu.
  3. Shut down.
  4. Shut down.

Define Keyboard Shortcuts

This is all very nice but, sometimes, giving a menu and a menu-item name to perform a simple action can be a bit bothersome. That's why the Speech development team introduced a very nifty command that allows you to enter any keyboard shortcut simply by saying "Define new keyboard shortcut."

A palette will then pop up, allowing you to enter the keyboard shortcut and the voice command you wish to associate to it. You can use such a command to, for example, create a "Close tab" command in Safari or a "New chat with" feature in iChat. Users with disabilities could create a custom command for "Zoom in" and "Zoom out."

Of course, since Panther allows you to define custom shortcuts through the Keyboard preferences pane, this feature is even more powerful than one could think at first sight.

Better Interactions with Your Mac

Spending your day in front of your screen isn't always fun, as enjoyable as using a Mac can be. Therefore, you may from time to time, wish to be able to step away from your computer -- when a long task is running, for example -- but without losing contact with your Mac in case something important happens.

That's pretty simple. Indeed, Mac OS X now features "talking alerts" -- this feature will cause your Mac to read the alert messages that may pop-up on your screen if you do not reply to them after a predefined delay.

This feature can also be very useful in an environment where multiple computers run at the same time -- a print shop or a computer lab in a school. Wouldn't it be nice to hear in a clear, distinctive voice "The PowerMac G5 next to the window needs your attention. The printer is out of paper," instead of a "Bong!" that you would need to track down?

In order to benefit from this feature, use the "Spoken User Interface" tab of the "Speech" preferences.

You can then define what the computer will do and after how long it will talk. I wouldn't recommend that you set a short delay since having the Mac read the alert while you are already reading and reacting to it may be annoying. Setting it to 10 seconds gives you the time to react if you already in front of the screen.

Your Mac can also read alert windows that, for any reason, would pop up behind your current application or working document.

The "Announce when an application requires your attention" option can also be a time saver. Indeed, while you are working, you may not notice the icons furiously bouncing in your Dock but will certainly hear "Safari needs your attention."

Adding Commands, Folders, or Files

Like many users, your workflow may require you to access documents that are buried in your folder hierarchy. Luckily, you can easily create a "command" that tells Speech to open them in the blink of an eye.

In order to do that, simply create an alias of the folders that you commonly use in the following folder:

[Home] -> Library -> Speech -> Speakable Items

Now, wherever you are, you simply need to say the name of the folder to open it. To make the alias creation process easier, remember than holding the option and Apple keys while dragging an icon creates an alias.

Making your own items able to be invoked by speech can itself be achieved by speech. Merely click on the item in the Finder and say, "Make this speakable." Speech will take care of making the alias, putting it in the Speakable Items folder, and removing the word "alias" from the alias.

Of course, you have to be careful not to drop any alias with a name that would match the one of an existing command too closely. Otherwise, you may end up opening this folder unwillingly. To avoid this, simply change the name of the alias and all will be well again.

Even cooler, you can put in there aliases to documents that you open often or the HTTP files that Mac OS X creates when you drag an URL from a browser's address bar onto the desktop. Just make sure that you give to these files a name that will be relatively easy to pronounce -- for example, remove the extensions if possible or you will have to pronounce "filename dot extension."

When All This Is not Enough

When adding aliases and interacting with buttons or menu items simply is not enough, keep in mind that both AppleScript and the Terminal can work closely with the Speech technology.

For example, here is how to write a script that will read a string of text ...

... in AppleScript:

Say "This is something very cool very cool very cool this is something very cool that every Mac can do!" using "Cellos"

... in the Panther Terminal:

say -v Cellos "This is something very cool very cool very cool this is something very cool that every Mac can do"

Note that the voice you pick will be ignored by AppleScript if Voice Recognition is turned on. This is a feature that allows users to enjoy consistency in the dialog they have with their computer.

When using the "Saving to file" option, however, the voice you pick is used, since the consistency of the interaction with the user is no longer a concern.

The ability to interact with the Speech Synthesizer even if you are not a developer will allow you to add speech capabilities to the Terminal scripts or AppleScripts that you already use in your daily workflow without having to learn a whole new set of commands or language.

Getting your Mac to Listen

Now that your existing scripts have gained the ability to speak to interact with you, wouldn't it be even better if they could listen? Well, Apple already thought of it and all the information that you need to create complex listen-and-tell scripts can be found on this page.

That way, you can create even more complex speakable items that will start a true dialog with you and react depending on your needs and answers.

Have Some Suggestions to Make it all more Exciting?

Indeed, I do! The first thing to do is to over-use the "Show me what to say" command and to try to do as much as you can with Speech. At first, it may look like you are actually losing time since you need to learn the commands and sometimes learn to speak into the microphone.

However, very quickly, you will see that you can do almost everything with Speech and get completely rid of meaningless alert sounds, creating a true dialog with your computer.

Many applications are speech-ready -- iChat, for example, can read aloud the name of the persons who invite you to a chat but this option is turned off by default. It is worth taking the time to learn what each one can -- and cannot do.

After a few days of practice, I am glad to say that I now can use my Mac without a keyboard or mouse for most of the day, except when typing, of course.

I Want to Create Sounds from Speech Synthesis

In some occasions, you may want to create a sound file from the text generated by the speech engine. The easiest way to do so is to use an AppleScript command like this:

say "This is something very cool very cool very cool this is something very cool that every Mac can do!" using "Cellos" saving to "Cool.aiff"

When you run this script, it creates a file at the root level of your hard drive, containing the sound that you would hear if the synthesis had happened on-the-fly.

Other Technologies

To achieve the same effect, you can also use the demo pages of the AT&T "Natural voices" technologies. Indeed, to demonstrate their system, AT&T allows you to type text into a web form and to download the resulting file. The main advantage of it is that it allows you to read text in many languages.

Here is the demo page. Of course, since there are certain limitations and copyrights that apply, I encourage you to read the Terms and conditions first. You should also keep in mind that this system is targeted at professional frameworks and that it runs on powerful servers.

Author's Note

During the preparation of this article, I had the opportunity to talk with Kim Silverman, principal research scientist, manager, spoken language technologies at Apple. May he find here the expression of my gratitude for the information he so kindly provided.

Needless to say, any errors or inaccuracies in the preceding pages remain entirely my responsibility.

FJ de Kermadec is an author, stylist and entrepreneur in Paris, France.

Return to