Technology news and Jobs arrow A Meaningful Look arrow Speaking more naturally than ever - iTWire podcast
Speaking more naturally than ever - iTWire podcast E-mail
by Tony Austin   
Sunday, 17 August 2008
In this iTWire podcast, Derek Austin of Nuance Communications Australia outlines the history of speech recognition, tells us why Dragon NaturallySpeaking 10 is even better than previous releases, and points us in the right direction to take advantage of its advanced features and make our lives easier.

Speech recognition technology has come a long way since I first investigated it in the mid 1990s.

I've always been fascinated by the intricacies of human languages since my youth, and delved deep into the software available at that time before giving up on it as "not ready yet" several years before the turn of the century.

Here I'm not talking about the IVR systems (interactive voice response) that we've all done battle with when calling our telco or bank, which are designed to cater for a limited range of input: single words and short phrases such as "Yes", "No", "Billing", "Technical Support" and so on (which all seem to be blithely unresponsive when you snap "Let me talk to a real person, damn you!").

No, I'm referring to desktop software applications like Dragon NaturallySpeaking from Nuance Communications, Inc. that transcribe your speech into text and save having to type it in.

As mentioned in Wikipedia (when describing medical transcription):

"... at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted. It was also the case that SR at that time was often technically deficient. Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do. The biggest limitation to speech recognition automating transcription, however, is seen as the software. The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system. Another limitation has been the extensive amount of time required by the user and/or system provider to train the software."

That's how I remember it from the 1990s. A major issue is that by their very nature SR algorithms have to use a "brute force" approach, since spoken language is notoriously difficult for machines (and even non-native speakers) to interpret.

Thinking of the English language alone, much less entirely different languages, you've doubtless heard the jocular claim "England and America are two countries divided by a common language" and there's the nub of the problem. The algorithms have lots of work to do in order to make sense of each spoken word in a particular context, and the desktop systems of the 1990s just didn't have the "grunt" (the raw processing power) to do a very good job of it.

After severely crushing the end of my right index finger in a sliding door mishap nearly two years ago, and having surgery to reattach the nail and stich up the gaping wound, I've found that -- even after waiting many months -- the delicate feel has never come back and my touch typing speed has suffered dramatically. (Why did it have to be the very finger that I rely on so heavily as the anchor for touch typing? That's got to be Murphy's Law in action!)

About a month ago, I decided to review the state of speech recognition software some ten years on. Has it improved much, and could assist me? Is it worth using even if you don't have some sort of disability?

What are my conclusions? Please read on to find out, and listen to the interesting podcast.



 
< Next story in category   Previous story in the category >
iTWire user statistics Visitors last 30 days
Suscribers
904,266
13,751
#1 independent technology news advertise here
  •   *  
  • Search
  • AdvSeach
  • Login
  • Events
  • FreeStuff
Subscribe to our free e-newsletter