Header Ads

Insight to ASR ( Automatic Speech Recognition ) & More


The invention and technological advancement in speech recognition technology and Interactive Voice Response Technology (IVRS) have personified our machines, it has been a breakthrough technology by which our machines can communicate with us and we can send response to them as well.

Both of these technologies have been really efficient and helped mankind by using voice recognition particularly for disabled. Similarly, text editing software had been really efficient in the novel development of mobile devices.

Here , we’ll be talking about the most common examples of voice recognition we have come across Samsung S-Voice and Apple Siri. Let e explain how Siri is so accurate in responding to you even if you ask some personal question or want a relationship advice. Siri has been using ASR and IVR technologies to get personified in its responses

Overview of ASR ( Automatic Speech Recognition )

If we talk about the process, the core function of automatic speech recognition follows pretty straightforward steps to give us a meaningful response
  1. Give a clue to your ASR deice (i.e. talking to technologies like Siri)
  2. The device captures the waveforms of sounds it hears from you and creates a wave out of it.
  3. ASR software then cleans up all the background disturbances and noises and normalizes the resulting sound.
  4. The cleaned and filtered waveform is then broken down into letters from our words, technically known as phonemes, which are 44 in English language.
  5. A single link chain is then formed from each phoneme, this sequence formed and analyzed by your device thus enabling it “understand” your sentence.

ASR Applications

The two most important applications of ASR being direct dialogue conversations and natural language conversations

1) Direct Dialogue conversations


These conversation depict simple form of ASR and IVR technology, they are usually situations when a computer or a device asks a human intervention in the form of an input. The answer it usually expects is a set of already fed words in the system. For instance, in voice conferencing, IVRS will ask “please say your 8 digit passcode” then we give a set of numbers or alphabets which is understandable by the system. This application is easily seen in banking, telephone reservation systems, etc.

2) Natural Language Conversations


Siri is one of the best examples that can be cited upon this form of ASR application. Natural Language conversations are more sophisticated and interactive ASR technology which is more of a basic human conversation with your device.

Benefits of Natural Language

Putting up natural language software work perfectly is a tedious job. If we consider a 60,000 word vocabulary list almost 216 trillion meaningful word combinations or sentences could be made out of it.

ASR reacts to a prescribed list of words called tagged words, ASR responds to this preselected list of words in a way that forms a context out of it. For instance you say Alarm, the technology will respond in a way to “Set Alarm” rather than “put” or “keep” alarm. Thus it will figure out that you want to set an alarm for a specific time in a correct grammatical form.

The complexity increases as the number of words ASR had to process increases. For a larger dictionary of vocabulary, the algorithm of natural language of ASR requires more training.

Tuning Test: Learning curve of ASR

ASR system is designed in a way that it is deemed to learn from humans. More exposure of human interaction will increase the vocabulary and write meaningful sentences in database. This is also called active learning of ASR.

1) Human Tuning


For this methodology, programmers generally review the conversations happened with the device by word logs generated and then enter these new phrases and words in the ASR dictionary, this is a means of teaching the device.

2) Active Learning


In active learning, there is no programmer intervention in teaching the device. Your system learns by its own by analyzing the past conversations from the log and adapt the speech patterns for future verbal exchanges. For instance, if you repeatedly cancel auto-correct on a specific word, the system will add this to your dictionary and interpret it as correct in further future conversations.

Looks interesting, right? We bet it does and if you want to explore and understand more about speech recognition, try the excellent IVR infographicfrom West Interactive.

How Your Devices Learn to Talk to You : InfoGraphic