O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


Writing a Speech Recognition App. in Carbon

by Michael J. Norton
07/13/2000

Related Articles

OS X Brings Unix Stability to the Mac

Connecting PCs to Apple's Wireless Airport

Mac OS X Terms and Definitions

Giving a machine the ability to converse with people has been the subject of many science fiction stories. There was the ominous Robby the Robot, in the 1956 MGM film, "The Forbidden Planet," constructed from the marvels of Krell technology. Don't forget the friendly and protective B9 robot from "Lost in Space" that every child of the sixties wished they owned. How about the devious HAL 2000, which was nothing more than a camera lens, but it spoke and taunted its sole victim, Dave, in "2001 Space Odyssey."

As a child, these science-fiction machines that could talk mesmerized me. With dates like 1997, for "Lost in Space," and 2001, I was given hope that these contraptions would someday exist in my future. It was a technology that would someday come. That someday crept up on all of us silently. In fact, this technology has been shipping on Apple Macintosh computers for at least five years now. To my dismay, however, there are still no talking space robots.

With the advent of G4 supercomputers, the anxiously awaited Unix Mach kernel-based Mac OS X operating system, multimedia capabilities, and speech recognition, perhaps we are within the grasp of Krell technology. All the ingredients are certainly present to make our system interact with us vocally. Maybe all we really need is a road map on how to use the new technology.

The Speech Recognition Manager

The Mac operating system ships with speech recognition and speech-to-text capabilities. PlainTalk is the software package that provides the operating system with the functionality required for your Macintosh to respond to your voice and to speak written text. The PlainTalk package comprises two software components, the Speech Recognition Manager and the Speech Synthesis Manager. The Speech Recognition Manager listens for our spoken words and phrases. The Speech Synthesis Manager allows the Mac to speak back to the user.

The Speech Recognition Manager and Speech Synthesis Manager are standard packages that ship with current versions of the Mac OS. If you're running Mac OS 8 or greater, then your operating system supports these managers. If you need a current image of these managers, you can download them from the Apple Speech web site. To see if your system has Speakable Items, select Speech from the pull-down control panel menu (see Figure 1). For my new G4 system I did need to download the Speech Recognition Manager. Speakable Items wasn't present in my operating system Speech control panel options menu.

Figure 1. Speech Control Panel.

Figure 1. Speech Control Panel.

Speech recognizer

To use Apple's speech recognition software you must open a recognition system. We'll soon explore how to code a recognition system. For the time being, you can open a simple recognition system from your control panel. Select the Speech control panel and enable speech by turning Speakable Items on. After selecting the On radio button, you should immediately hear the Speech Synthesis Manager informing you verbally that the Speakable Items software is running. A window will open that is a component of the Speech Recognition Manager, your feedback window. The feedback window on my desktop is provided as an example; see Figure 2.

Figure 2. Feedback window.

Figure 2. Feedback window.

The Speakable Items components are fully configurable through the control panel so you can customize the character in the window and its synthetic voice. The components you select in this window will also be used when you start playing with your speech recognition code.

Downloading the Speech Recognition Manager SDK

Developing applications that use the Speech Recognition Manager require the Speech Recognition Manager SDK. This is free from Apple Computer. The development libraries in this SDK assume that you are using the Metrowerks CodeWarrior. You also can use the MPW development environment, which also is available free from Apple. Some mangling and recompiling of the headers and libraries is required, but MPW can be used.

The major difference is that Metrowerks CodeWarrior is a full-featured integrated development environment while MPW is command-line and script based. A strong knowledge of make files is a must for using MPW. The example provided for this article will use Metrowerks CodeWarrior. However, the source code will build under both environments. Keep in mind that with Mac OS X, Apple has no intention of maintaining MPW.

While you're downloading, be sure to pick up a copy of the Speech Recognition Manager documentation available from Apple in Adobe Acrobat PDF format.

Speech objects

The Speech Recognition Manager is object oriented in design. The parent class of the Speech Manager is the SRSpeechObject. An instance of this object, a speech object, may include an SRRecognitionSystem object, an SRRecognizer object, an SRSpeechSource object, and an SRLanguageObject model object.

SRRecognitionSystem object

We will create an SRRecognitionSystem object when we initialize our speech recognition application. This object is used to open and close the speech recognition system we will be using.

SRRecognizer object

The SRRecognizer object is the attentive component of our speech application. When a user speaks into the microphone, this object listens for an utterance. The utterance is processed and the SRRecognizer determines if it is recognizable or gibberish. The result is processed and then sent to our application.

SRLanguageObject object

Our speech application will need to be instructed which words, phrases, and complex phrases to listen for. The list of items we wish to listen for will be maintained by the SRLanguageObject instance. The items in this list are sub-classes, SRWord for a word, SRPhrase for a phrase, and SRLanguageModel for a complex phrase.

Using the SpeechRecognitionManager API

The example code provided is intended for Mac OS X portability and uses the Carbon Events calling conventions. You can still develop code for Carbon on Mac OS 9 with the Carbon SDK. Using the CarbonStub9 library, Carbon applications can run under Mac OS 9. For simplicity, I merely made a copy of the BasicCarbEvents project, found in the Sample Code folder, and inserted my own code. The speech recognition manager code is built around this example. Figure 3 shows an example of what my project looks like using Metrowerks Code Warrior 5.

Figure 3. Using the BasicCarbEvents project.

Figure 3. Using the BasicCarbEvents project.

Initializing the Speech Manager

The Speech Manager API, as it is currently available for Mac OS 9, is portable for Mac OS X. Preliminary documentation on the Speech Manager API for Carbon is available in HTML format. Therefore, the initialization of the SRRecognitionSystem and SRRecognizer objects are pretty much as they appear in the original pre-Carbon Speech Manager PDF document I referenced earlier. For creating a SpeechObject instance in our application we have the following code:

OSStatus InitSpeech (void)
{
  OSStatus err = kBadSRMVersion;
  long currVersion;
  short feedback = kSRHasFeedbackHasListenModes;
  /* check for valid Speech Recognition Manager
*/
  err = Gestalt(gestaltSpeechRecognitionVersion,
&currVersion);
  if (!err)
  {
    if (currVersion < kMinSRVersion)
      return kBadSRMVersion;
  }
  /* instantiate the SR system object */
  err = SROpenRecognitionSystem( &objSRsystem,
kSRDefaultRecognitionSystemID);
  /* use standard feedback window and listening
modes */
  if (!err)
    err = SRSetProperty(objSRsystem,
kSRFeedbackAndListeningModes, &feedback, sizeof(feedback));
  /* instantiate a Speech recognizer object*/
  if (!err)
    err = SRNewRecognizer(objSRsystem,
&objSRrecognizer, kSRDefaultSpeechSource);
  return err;
}

Function InitSpeech checks for a valid version of the Speech Manager installed in the operating system. If the version is 1.5 or greater, the code then creates a SpeechObject instance with a call to SROpenRecognitionSystem. My code uses the variables objSRsystem for the SRRecognitionSystem object and objSRrecognizer for the SRRecognizer object. You will want to refer to the SpeechRecLib.c file for other variable declarations.

The header file SpeechRecLib.h does not have any variable declarations, but only contains the prototypes for the function calls that are exported by the library -- a simple practice for variable protection. By calling InitSpeech from our application, the call to SRSetProperty tells the Speech Manager the type of listening mode we want and that we will be using a feedback window. This is all we need to do to initialize the Speech Recognition Manager toolbox. Now we'll need to teach our SpeechObject which words to listen for.

Speech recognition language models

The SRRecognizer object must be taught which words and phrases we want to have recognized. To do this, we must assign a speech language model to our recognizer. The function InitLanguageModel in source file SpeechRecLib.c provides the Speech Recognition Manager calls to install our language model. The language model itself is declared in the header file LangModel.h. Here is our language model for this example,

const char kLangModelName[] = "<Top LM>";
const char *kRecWordList[] =
  { "Hello", "excuse me", "prime directive",
"nine meter humanoid", NULL};

This is a simple list I made up. The recognizer will respond with famous quotes from two popular robots in classic science-fiction film and television. I enjoy using commands for Robby the Robot from "The Forbidden Planet" and the B9 environmental control robot from the 1965 series "Lost In Space."

The user utterances are not quite as memorable as the recognizer responses we'll use. But, we're all about having fun, right? This is our language model -- the key words and phrases we want our speech recognition software to recognize when the user speaks them. When we initialize our application, we will pass this information into the InitLanguageModel function as parameters.

OSStatus
InitLanguageModel (const char kLangModelName[],
const char *kRecWordList[],long topRefCon, long listRefcon)
{
  OSStatus err = noErr;
  /* set the Top Language Model */
  err = SRNewLanguageModel( objSRsystem,
&objSRtlm, kLangModelName, strlen(kLangModelName) );
  /* assign the reference constant to this
model. We will need it later when we process results */
  err = SRSetProperty( objSRtlm, kSRRefCon,
&topRefCon, sizeof(topRefCon) );
  /* initialize the utterance list */
  if (!err)
  {
    char **currentStr = (char **)
kRecWordList;
    while (*currentStr &&
!err)
    {
      err = SRAddText(objSRtlm,
*currentStr, strlen(*currentStr), listRefcon++);
      ++currentStr;
    }
   }
   return err;
}

This function creates an instance of the object SRLanguageModel. The Speech Recognition call, SRNewLanguage, we pass in a reference of the SRSpeechRecognitionSystem object, objSRsystem, and assign it an instance of SRLanguageModel, objSRtlm. The function then iterates through our language model and adds this information to our top language model object. This code is pretty much reflective of the Speech Recognition Manager API document; see pages 1-22 and 1-23. I added modifications to make the code more modular.

I haven't really discussed the reference constants yet (topRefCon and listRefcon). We pass in two constants when we call InitLanguageModel. The first reference constant is used to identify the top language model we will be using. This is the identifier for our language model. It's a constant defined in file LangModel.h. The second constant we pass in is an index of the first utterance in our list of words or phrases to listen for. We will need these values when we retrieve recognition results in our event handler.

The value I will pass in here is kHelloRefCon, from my enumerated list in file LangModel.h, which indexes the string "hello," from kRecWordList.

We have our language model in place; now we need to instruct the SRRecognizer to start listening for our user's utterances.

OSStatus EnableSpeechModel (void)
{
  OSStatus err = noErr;
  err = SRSetLanguageModel(objSRrecognizer,
objSRtlm);
  if (!err)
    err = SRStartListening(objSRrecognizer);
  return err;
}

In the function EnableSpeechModel, the SRRecognizer object is assigned our language model. The Speech Manager API call SRStartListening is evoked, and the feedback window is ready to receive input. Just for fun, try saying a few words that are not in our language model. If you were to say the word "nachos," the feedback window would display "???" (see Figure 4), the Speech Recognition Manager's rejected word utterance. This is how the feedback window conveys that the utterance was not recognized.

Figure 4. Nachos is not recognized.

Figure 4. Nachos is not recognized.

Handling recognition events with Carbon

The basic requirement of our speech recognition application is to have the software interact with the user based on a spoken set from our language model. This means the recognizer must be able to provide feedback based on a user's utterance. The recognizer accomplishes this by posting notifications to the Apple event handler. The event handler is a function we will provide to handle Apple events of class kAESpeechSuite and type kAESpeechDone.

Please keep in mind we are writing an application to run under Mac OS X. Our application must use Carbon (Mac OS X) event handling and know how to handle Apple (classic) events. The Speech Recognition Manager document clearly explains how to handle Apple events and install your own event handler. For reference, see pages 1-27 through 1-29 of this manual. I will walk you through the code for handling Apple events under the Carbon event architecture. In the last few lines in function main, source file SpeechDrvr.c, you will see the following lines of code:

EventTypeSpec aelist[] = { {kEventClassEPPC,kEventHighLevelEvent}
};
InstallEventHandler( GetApplicationEventTarget(),
NewEventHandlerUPP( MyAppleEventHandler ),1,aelist,0,NULL
);
AEInstallEventHandler(kAESpeechSuite, kAESpeechDone,

NewAEEventHandlerUPP(MyHandleAESpeechDone), 0, false);
RunApplicationEventLoop();

A little philosophy behind what is going on in this code is required. The call InstallEventHandler is a Carbon event handler we will supply. Our Carbon event handler is function MyAppleEventHandler. Is your head spinning yet? Not to worry, here is some clarification. Many APIs have been ported directly to Carbon. Apple event handling, Speech Recognition is one of them. The classic way to handle this event is to write a custom event loop and query events with the classic call WaitNextEvent. Using a switch statement, we would examine the events for the case kHighLevelEvent and then call AEProcessAppleEvent. This is how we would invoke MyHandleAESpeechDone in the Apple (classic) event model. This is not necessary under Carbon.

Under the Carbon event model, we provide a list of EventTypeSpec where the operating system listens for events. We are waiting for a Carbon event of class kEventClassEPPC and type kEventHighLevelEvent. If this event occurs, our registered handler, MyAppleEventHandler, is called to handle Apple events. The code for our registered Carbon event handler for Apple events looks like:

static pascal OSStatus MyAppleEventHandler (EventHandlerCallRef
myAEHandler, EventRef inEvent, void* userData)
{
  OSStatus myErr;
  if ( GetEventClass( inEvent ) == kEventClassEPPC
&& GetEventKind( inEvent ) == kEventHighLevelEvent )
  {
    EventRecord er;
    ConvertEventRefToEventRecord(
inEvent, &er );
    return AEProcessAppleEvent(
&er );
  }
  return myErr;
}

A key point here is that we did not have to write a WaitNextEvent loop routine with a complex switch statement to handle individual events. The Carbon event function, RunApplicationEventLoop, handles registered events for us. When a Carbon event of type kEventHighLevelEvent occurs, our handler, MyAppleEventHandler, is called.

This triggers the AEProcessAppleEvent to process the Apple event that the recognizer posted. It's sort of a dual personality we're dealing with between Carbon events and classic Apple events. The AEProcessAppleEvent triggers our handler, MyAESpeechDone, in the case where the Apple event is kAESpeechDone. If you don't understand this at first, don't worry. This Carbon and Apple event dual handling is confusing for most developers as well.

Extracting recognition results

Now our discussion of language models will come full circle. Earlier we declared a list of items we wanted our recognizer to listen for. In a similar manner we need to provide a series of responses for each utterance that is recognized. Our list of pre-defined responses is defined in header file LangModel.h. Figure 5 shows the feedback window with our responses.

/* pre-defined phrases for recognizer to respond
with */
const char *kResponseList[] = {
  "I am monitored to respond to the name Robby",
   "Can I be of Service?",
  "Destroy the Jupiter 2",
  "Does Not Compute!" };
Figure 5. Feedback window with our responses.

Figure 5. Feedback window with our responses.

Perhaps some of the old science-fiction stars' quotes will become more recognizable now. When a user speaks the word "Hello," the recognizer will respond with the phrase, "I am monitored to respond to the name Robby."

Each word or phrase assigned in the language model has a one-to-one correspondence in this coding example. Our recognition result extracting function, which is called from our function MyHandleAESpeechDone (see listing for SpeechEvent.c) is given a reference number of the language model and a reference index of the utterance spoken. When an event is processed, our handler calls the routine ExtractResult. The two parameters passed to this routine are the reference constants: One identifies the language model and the other is the index of the word or phrase spoken from our kRecWordList. With the index of the word or phrase, we can play some games and provide responses to the user using our kResponseList. The function Extractresult achieves this by calling the Speech Recognition Manager routine SRSpeakAndDrawText.

Speech recognition and you

I have presented a very simple language model to get you going in your speech recognition endeavors. This simple code can be easily expanded to include complex phrases. A very comprehensive explanation is available from Apple develop, Issue 27, the Speech Recognition Manager Revealed, which will walk you through more complex speech models.

I used a snippet of code from this article, namely the function ProcessRecognitionResults, and simplified it for my example function ExtractResult. The SpeechRecLib source is also suited for Carbon coding and Mac OS X. It will run under Mac OS 9 using the Carbon SDK CarbonStub9 library.

The information I presented is a road map you may use to usher in a future you may have dreamed of. Perhaps you can apply this newly found knowledge to build that space robot you always wanted as a kid. I know I am. See you in the future!

Michael J. Norton is a software engineer at Cisco Systems.


Related Articles

OS X Brings Unix Stability to the Mac

Connecting PCs to Apple's Wireless Airport

Mac OS X Terms and Definitions


Discuss this article in the O'Reilly Network Forum.

Return to the O'Reilly Network Hub.

Copyright © 2009 O'Reilly Media, Inc.