Jun 6
0

Machine Voice Recognition Becomes More Human

In a typical day, we each probably have dozens (and sometimes many dozens) of conversations.  Part of the natural art of conversation is our ability to speak and listen to a single individual, even when there is background noise such as music playing or other conversations in progress.  We instinctively focus our attention on the person in front of us, and subconsciously filter out the background noise.  We can do that even when the background noise is quite loud – possibly louder than the conversation in which we are engaged.

Implementing that same capability in a machine voice recognition system is quite challenging.  After all, the machine doesn’t have the advantage of all the visual cues that most humans can process in a typical conversation – so how does it know when to “pay attention” to a speaker’s request?  And how does that same system manage overlapping speech or background music to figure out which words are important?

It turns out that two separate solutions working in tandem are required to solve these problems for machine-based voice recognition systems.  The first is one is probably familiar to those of us who are using systems such as Amazon’s Alexa, Google’s Home, or Apple’s Siri application in hands-free mode.  That is to define certain keyword voice triggers like “Alexa”, “Okay Google” or “Hey Siri”.  Those triggers tell the device to “pay attention to what I am about to say next.”

However, voice triggers alone aren’t enough.  What if my phone ring tone has been activated by an incoming call on my phone, or what if I’m already playing music when I try to use the appropriate voice trigger with my device?  That’s where the second solution comes into play.  It’s called “Acoustic Echo Cancellation” (or “AEC” for short).  AEC enables voice recognition systems to maximize their accuracy even when background noise is created by the listening device’s own speaker.

What does all of this have to do with QuickLogic?  The answer is that we’ve just added this capability to our EOS™ S3 Sensor Processing SoC.  We call it a “voice barge-in” feature and basically it enables systems using the EOS S3 device to more naturally and dependably recognize voice triggers when background noise is present.  Our new AEC technology works together with the device’s integrated Sensory TrulyHandsfree speech recognizer as well as cloud-based speech recognition systems.  Having the voice barge-in feature is a big step forward in improving the quality of the human experience for interacting with a wide variety of voice interface devices including smartphones and headsets.

Although our approach to implementing this capability is quite a bit different than the way a human would process speech and determine the important content, the net result is a more human-like interaction with both local and cloud-based voice recognition computational machinery.  This machinery has been becoming increasingly capable and ubiquitous and represents nothing less than a full-blown revolution in how we humans interact with our devices.  We at QuickLogic are proud to be playing our part.

This entry was posted in EOS, Sensor Processing Platform and tagged , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL.

Leave a Comment

  • Submit a Question

    Fields marked with an * are required
  • Subscribe to Blog

    Enter your email address to subscribe to this blog and receive notifications of new posts by email.

  • QuickLogic Social Sites

  • Recent Posts

  • Recent Comments

  • Categories

  • Archives