Wednesday, 19 October 2016 13:05

Microsoft cracks the code on speech recognition – as good as humans

By

Microsoft has “cracked the code” achieving computer speech recognition on parity with humans for conversational speech.

This has long been the goal – a computer than can recognise conversational language with human accuracy. It is defined as Word Error Rate and humans generally get about 5.9% wrong. Microsoft says it is the lowest recorded against the industry standard Switchboard speech recognition task.

Suffice to say the researchers at Microsoft AI are chuffed and have published the paper in the Cornell University Library.

The abstract says, “The key to our system's performance is the systematic use of convolutional and LSTM neural networks, combined with a novel spatial smoothing method and lattice-free MMI acoustic training.” 

“We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist. “This is a historic achievement.”

The milestone means that, for the first time, a computer can recognise the words in a conversation as well as a person would. In doing so, the team has beat a goal they set less than a year ago — and greatly exceeded everyone else’s expectations as well.

“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, the executive vice-president who heads the Microsoft Artificial Intelligence and Research group.

An important point to note is that the research team has reached parity with humans – it is not yet better. But part of that allows the computer to guess what a misspelled, or misheard word is, and to fill in any gaps.

How did they do it?

To reach the human parity milestone, the team used Microsoft’s Computational Network Toolkit (CNTK), a homegrown system for deep learning that the research team has made available on GitHub via an open source licence.

Huang said CNTK’s ability to quickly process deep learning algorithms across multiple computers running a specialised chip called a graphics processing unit vastly improved the speed at which they were able to do their research and, ultimately, reach human parity.

Microsoft has been on a roll lately. Last week another group of Microsoft researchers, who are focused on computer vision, reached a milestone of their own. The team won first place in the COCO image segmentation challenge, which judges how well a technology can determine where certain objects are in an image.

So what does this mean?

It has broad implications for consumer and business products that can be significantly augmented by speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription, and personal digital assistants such as Cortana.

“This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said.

Zweig said the researchers are now working on ways to make sure that speech recognition works well in more real-life settings. That includes places where there is a lot of background noise, such as at a party. They will also focus on better ways to help the technology assign names to individual speakers when multiple people are talking, and on making sure that it works well with a wide variety of voices, regardless of age, accent or ability.

In the longer term, researchers will focus on ways to teach computers not just to transcribe the acoustic signals that come out of people’s mouths, but instead to understand the words they are saying. That would give the technology the ability to answer questions or take action based on what they are told.

“The next frontier is to move from recognition to understanding,” Zweig said.

MS Speech recog team


Please join our community here and become a VIP.

Subscribe to ITWIRE UPDATE Newsletter here
JOIN our iTWireTV our YouTube Community here
BACK TO LATEST NEWS here

PROMOTE YOUR WEBINAR ON ITWIRE

It's all about Webinars.

Marketing budgets are now focused on Webinars combined with Lead Generation.

If you wish to promote a Webinar we recommend at least a 3 to 4 week campaign prior to your event.

The iTWire campaign will include extensive adverts on our News Site itwire.com and prominent Newsletter promotion https://itwire.com/itwire-update.html and Promotional News & Editorial. Plus a video interview of the key speaker on iTWire TV https://www.youtube.com/c/iTWireTV/videos which will be used in Promotional Posts on the iTWire Home Page.

Now we are coming out of Lockdown iTWire will be focussed to assisting with your webinatrs and campaigns and assassistance via part payments and extended terms, a Webinar Business Booster Pack and other supportive programs. We can also create your adverts and written content plus coordinate your video interview.

We look forward to discussing your campaign goals with you. Please click the button below.

MORE INFO HERE!

INTRODUCING ITWIRE TV

iTWire TV offers a unique value to the Tech Sector by providing a range of video interviews, news, views and reviews, and also provides the opportunity for vendors to promote your company and your marketing messages.

We work with you to develop the message and conduct the interview or product review in a safe and collaborative way. Unlike other Tech YouTube channels, we create a story around your message and post that on the homepage of ITWire, linking to your message.

In addition, your interview post message can be displayed in up to 7 different post displays on our the iTWire.com site to drive traffic and readers to your video content and downloads. This can be a significant Lead Generation opportunity for your business.

We also provide 3 videos in one recording/sitting if you require so that you have a series of videos to promote to your customers. Your sales team can add your emails to sales collateral and to the footer of their sales and marketing emails.

See the latest in Tech News, Views, Interviews, Reviews, Product Promos and Events. Plus funny videos from our readers and customers.

SEE WHAT'S ON ITWIRE TV NOW!

BACK TO HOME PAGE
Ray Shaw

joomla stats

Ray Shaw ray@im.com.au  has a passion for IT ever since building his first computer in 1980. He is a qualified journalist, hosted a consumer IT based radio program on ABC radio for 10 years, has developed world leading software for the events industry and is smart enough to no longer own a retail computer store!

Share News tips for the iTWire Journalists? Your tip will be anonymous

WEBINARS ONLINE & ON-DEMAND

GUEST ARTICLES

VENDOR NEWS

Guest Opinion

Guest Interviews

Guest Reviews

Guest Research

Guest Research & Case Studies

Channel News

Comments