Telstra has revealed the addition of almost one million new mobile services in the six months to December 2011, but Sensis revenues plummeted 24 percent in 12 months.
read more
Tony Austin
Sunday, 17 August 2008 08:07
Speech recognition technology has come a long way since I first investigated it in the mid 1990s.
I've always been fascinated by the intricacies of human languages since my youth, and delved deep into the software available at that time before giving up on it as "not ready yet" several years before the turn of the century.
Here I'm not talking about the IVR systems (interactive voice response) that we've all done battle with when calling our telco or bank, which are designed to cater for a limited range of input: single words and short phrases such as "Yes", "No", "Billing", "Technical Support" and so on (which all seem to be blithely unresponsive when you snap "Let me talk to a real person, damn you!").
No, I'm referring to desktop software applications like Dragon NaturallySpeaking from Nuance Communications, Inc. that transcribe your speech into text and save having to type it in.
As mentioned in Wikipedia (when describing medical transcription):
"... at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted. It was also the case that SR at that time was often technically deficient. Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do. The biggest limitation to speech recognition automating transcription, however, is seen as the software. The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system. Another limitation has been the extensive amount of time required by the user and/or system provider to train the software."
That's how I remember it from the 1990s. A major issue is that by their very nature SR algorithms have to use a "brute force" approach, since spoken language is notoriously difficult for machines (and even non-native speakers) to interpret.
Thinking of the English language alone, much less entirely different languages, you've doubtless heard the jocular claim "England and America are two countries divided by a common language" and there's the nub of the problem. The algorithms have lots of work to do in order to make sense of each spoken word in a particular context, and the desktop systems of the 1990s just didn't have the "grunt" (the raw processing power) to do a very good job of it.
After severely crushing the end of my right index finger in a sliding door mishap nearly two years ago, and having surgery to reattach the nail and stich up the gaping wound, I've found that -- even after waiting many months -- the delicate feel has never come back and my touch typing speed has suffered dramatically. (Why did it have to be the very finger that I rely on so heavily as the anchor for touch typing? That's got to be Murphy's Law in action!)
About a month ago, I decided to review the state of speech recognition software some ten years on. Has it improved much, and could assist me? Is it worth using even if you don't have some sort of disability?
What are my conclusions? Please read on to find out, and listen to the interesting podcast.

|
Microsoft Office 365Try an easy-to-use set of web-enabled tools for business-class productivity services. Office 365 provides anywhere-access to email, important documents, contacts, and calendars on almost any device. |