advertisement

Print

Writing a Speech Recognition App. in Carbon
Pages: 1, 2, 3, 4

Speech recognition language models

The SRRecognizer object must be taught which words and phrases we want to have recognized. To do this, we must assign a speech language model to our recognizer. The function InitLanguageModel in source file SpeechRecLib.c provides the Speech Recognition Manager calls to install our language model. The language model itself is declared in the header file LangModel.h. Here is our language model for this example,



const char kLangModelName[] = "<Top LM>";
const char *kRecWordList[] =
  { "Hello", "excuse me", "prime directive",
"nine meter humanoid", NULL};

This is a simple list I made up. The recognizer will respond with famous quotes from two popular robots in classic science-fiction film and television. I enjoy using commands for Robby the Robot from "The Forbidden Planet" and the B9 environmental control robot from the 1965 series "Lost In Space."

The user utterances are not quite as memorable as the recognizer responses we'll use. But, we're all about having fun, right? This is our language model -- the key words and phrases we want our speech recognition software to recognize when the user speaks them. When we initialize our application, we will pass this information into the InitLanguageModel function as parameters.

OSStatus
InitLanguageModel (const char kLangModelName[],
const char *kRecWordList[],long topRefCon, long listRefcon)
{
  OSStatus err = noErr;
  /* set the Top Language Model */
  err = SRNewLanguageModel( objSRsystem,
&objSRtlm, kLangModelName, strlen(kLangModelName) );
  /* assign the reference constant to this
model. We will need it later when we process results */
  err = SRSetProperty( objSRtlm, kSRRefCon,
&topRefCon, sizeof(topRefCon) );
  /* initialize the utterance list */
  if (!err)
  {
    char **currentStr = (char **)
kRecWordList;
    while (*currentStr &&
!err)
    {
      err = SRAddText(objSRtlm,
*currentStr, strlen(*currentStr), listRefcon++);
      ++currentStr;
    }
   }
   return err;
}

This function creates an instance of the object SRLanguageModel. The Speech Recognition call, SRNewLanguage, we pass in a reference of the SRSpeechRecognitionSystem object, objSRsystem, and assign it an instance of SRLanguageModel, objSRtlm. The function then iterates through our language model and adds this information to our top language model object. This code is pretty much reflective of the Speech Recognition Manager API document; see pages 1-22 and 1-23. I added modifications to make the code more modular.

I haven't really discussed the reference constants yet (topRefCon and listRefcon). We pass in two constants when we call InitLanguageModel. The first reference constant is used to identify the top language model we will be using. This is the identifier for our language model. It's a constant defined in file LangModel.h. The second constant we pass in is an index of the first utterance in our list of words or phrases to listen for. We will need these values when we retrieve recognition results in our event handler.

The value I will pass in here is kHelloRefCon, from my enumerated list in file LangModel.h, which indexes the string "hello," from kRecWordList.

We have our language model in place; now we need to instruct the SRRecognizer to start listening for our user's utterances.

OSStatus EnableSpeechModel (void)
{
  OSStatus err = noErr;
  err = SRSetLanguageModel(objSRrecognizer,
objSRtlm);
  if (!err)
    err = SRStartListening(objSRrecognizer);
  return err;
}

In the function EnableSpeechModel, the SRRecognizer object is assigned our language model. The Speech Manager API call SRStartListening is evoked, and the feedback window is ready to receive input. Just for fun, try saying a few words that are not in our language model. If you were to say the word "nachos," the feedback window would display "???" (see Figure 4), the Speech Recognition Manager's rejected word utterance. This is how the feedback window conveys that the utterance was not recognized.

Figure 4. Nachos is not recognized.

Figure 4. Nachos is not recognized.

Handling recognition events with Carbon

The basic requirement of our speech recognition application is to have the software interact with the user based on a spoken set from our language model. This means the recognizer must be able to provide feedback based on a user's utterance. The recognizer accomplishes this by posting notifications to the Apple event handler. The event handler is a function we will provide to handle Apple events of class kAESpeechSuite and type kAESpeechDone.

Please keep in mind we are writing an application to run under Mac OS X. Our application must use Carbon (Mac OS X) event handling and know how to handle Apple (classic) events. The Speech Recognition Manager document clearly explains how to handle Apple events and install your own event handler. For reference, see pages 1-27 through 1-29 of this manual. I will walk you through the code for handling Apple events under the Carbon event architecture. In the last few lines in function main, source file SpeechDrvr.c, you will see the following lines of code:

EventTypeSpec aelist[] = { {kEventClassEPPC,kEventHighLevelEvent}
};
InstallEventHandler( GetApplicationEventTarget(),
NewEventHandlerUPP( MyAppleEventHandler ),1,aelist,0,NULL
);
AEInstallEventHandler(kAESpeechSuite, kAESpeechDone,

NewAEEventHandlerUPP(MyHandleAESpeechDone), 0, false);
RunApplicationEventLoop();

A little philosophy behind what is going on in this code is required. The call InstallEventHandler is a Carbon event handler we will supply. Our Carbon event handler is function MyAppleEventHandler. Is your head spinning yet? Not to worry, here is some clarification. Many APIs have been ported directly to Carbon. Apple event handling, Speech Recognition is one of them. The classic way to handle this event is to write a custom event loop and query events with the classic call WaitNextEvent. Using a switch statement, we would examine the events for the case kHighLevelEvent and then call AEProcessAppleEvent. This is how we would invoke MyHandleAESpeechDone in the Apple (classic) event model. This is not necessary under Carbon.

Under the Carbon event model, we provide a list of EventTypeSpec where the operating system listens for events. We are waiting for a Carbon event of class kEventClassEPPC and type kEventHighLevelEvent. If this event occurs, our registered handler, MyAppleEventHandler, is called to handle Apple events. The code for our registered Carbon event handler for Apple events looks like:

static pascal OSStatus MyAppleEventHandler (EventHandlerCallRef
myAEHandler, EventRef inEvent, void* userData)
{
  OSStatus myErr;
  if ( GetEventClass( inEvent ) == kEventClassEPPC
&& GetEventKind( inEvent ) == kEventHighLevelEvent )
  {
    EventRecord er;
    ConvertEventRefToEventRecord(
inEvent, &er );
    return AEProcessAppleEvent(
&er );
  }
  return myErr;
}

A key point here is that we did not have to write a WaitNextEvent loop routine with a complex switch statement to handle individual events. The Carbon event function, RunApplicationEventLoop, handles registered events for us. When a Carbon event of type kEventHighLevelEvent occurs, our handler, MyAppleEventHandler, is called.

This triggers the AEProcessAppleEvent to process the Apple event that the recognizer posted. It's sort of a dual personality we're dealing with between Carbon events and classic Apple events. The AEProcessAppleEvent triggers our handler, MyAESpeechDone, in the case where the Apple event is kAESpeechDone. If you don't understand this at first, don't worry. This Carbon and Apple event dual handling is confusing for most developers as well.

Pages: 1, 2, 3, 4

Next Pagearrow