Print document
 79 of 161 
74 75 76 77 78 79 80 81 82 83 84 Full Book - Searchable
79
your mouth as you say something adds important intelligence to enabling a
computer to recognize certain speech utterances.  The information is known as 
visual phonemes or ``visemes''. Visemes provide information that complements
the phonetic stream from the point of view of confusability. For example, ``mi''
and ``ni'' which are confusable acoustically, especially in noisy environments, are
easy to distinguish visually: in ``mi'' lips close at onset, where as in ``ni'' they do
not. Similarly, ``f'' and ``s'' which are difficult to recognize acoustically belong to
two different viseme groups. Experiments are underway and in the not too distant
future it may be possible to use face detection combined with voice recognition to
take things to the next level. 
I can still remember the first time I witnessed speech recognition. It was in 1981
and at the time I was assistant to the CFO of IBM. One day he was invited to visit
Yorktown Heights, home of the Thomas J. Watson Research Center where
hundreds of researchers have created many of the world’s great inventions.  The
main purpose of that particular day’s visit was to get an update on the state of
speech recognition. A group of us entered a huge room that was full of
computers. I had never seen such a large computing center. It was enormous.
We all huddled around the console of this “supercomputer” while several PhD’s
prepared the demonstration. One of them sat at the console in front of a large
microphone – looked not unlike a radio station. We were all asked to please be
silent. You could have heard a pin drop on the floor. The researcher got very
close to the microphone and with a perfect articulation he said the word “nine”.
We waited and waited and waited. Seemed like forever. Like waiting for a pan of
water to boil. Finally, a response came. We all crowded up to see the video
console where it displayed a 9. Our mouths dropped open in awe.
Accessible to all
The NGI offers great hope to those who have speech, language or hearing
impairments. New therapeutic technologies are being developed too. A
computerized language tool called SpeechViewer transforms spoken words and
sounds into imaginative graphics. The result is greatly increased effectiveness of
speech therapy and speech modification for people who need it. They can select
from over a dozen language exercises. Each exercise responds to your voice
input with immediate, clear and meaningful feedback that helps you “see how to
speak.” In addition, the speaker receives animated rewards that reinforce
successful responses. This kind of technology can be of great help to help
people of all ages who have a variety of disabilities, such as speech or language
impairments, cerebral palsy, developmental delay, traumatic brain injury, and
speech disorders resulting from a stroke. The technology provides a real boost in
effectiveness for professional speech language pathologists, special education
teachers, teachers of the deaf, English as a second language instructors, and
professionals working with accent reduction. 
Previous page Top Next page