Friday, 02 October 2020 09:40

UNSW leads research into speech recognition systems


Engineers from the University of NSW are leading a research project aimed at closing a gap in speech recognition systems that they say have until now performed badly in understanding young users of technology.

The project will see UNSW Sydney researchers sample the voices of Australian kids so that they can be better understood by devices that use voice recognition software.

And the researchers say the benefits could also flow into education and speech therapy where digital devices could provide “immediate and ongoing feedback in speech training and other learning tasks”.

UNSW says that up until now, speech recognition software that powers virtual assistants like Google Assistant, Alexa and Siri has relied on a growing database of adult voices.

But UNSW says all that is about to change with the launch of AusKidTalk, a joint project of five Australian universities that aims to build a database of Australian children’s voices.

Dr Beena Ahmed, a senior lecturer with UNSW’s School of Electrical Engineering and Telecommunications, says while speech recognition technology has made leaps and bounds in the last decade, the technology is still lagging when it comes to understanding and speaking with children.

“There’s been a big improvement in speech recognition to work with different accents and languages,” she says.

“But so far that has just been for adults. There is a definite shortage of data for children – not just in Australia, but all over the world. This is despite children being such an important demographic. Companies like Amazon, Apple and Google are all starting to notice that this is a big market.”

Dr Ahmed and her fellow engineers, linguists, psychologists and speech pathologists are about to start recruiting 750 children between the ages of three and 12 to provide speech samples as part of the AusKidTalk program.

In sound-proof studios located at each of the five campuses, the children will be recorded as they are prompted to repeat words, digits and sentences before engaging in unscripted storytelling exercises.

UNSW says the new database of children’s speech will be used by linguists and psychologists to better understand how children develop their speech and language - and engineers, meanwhile, will be able to use it to develop new speech recognition systems that will interact with younger users much more seamlessly.

Dr Ahmed says the accuracy of speech recognition systems when interacting with children has so far been quite poor.

“The main reason for this is because children’s speech is quite different from adults’ speech.

“Children’s language skills aren’t as sophisticated as adults’. They might mispronounce or leave sounds or words out, or change the expected order of words. Then there are physiological differences – their vocal tract isn’t fully developed, and until they hit puberty, they speak in much higher pitches. All this makes their speech very different from adults and therefore harder for speech recognition systems to process.”

In addition to recording samples of typical speech, the researchers will also be recording samples of disordered speech spoken by children.

Dr Ahmed says the idea behind this is if speech recognition systems could be taught to recognise when children are having problems forming words, they could not only be used to understand voice commands spoken by kids with impaired speech, but could also be used therapeutically to help with speech training using a mobile device.

“Speech therapy is a very costly business,” Dr Ahmed says.

“You’ve got parents spending up to $200 for a session with a clinician, and still having to do a lot of home practice that the clinician can’t monitor.

“Another problem is that parents can also find it hard to provide feedback themselves, because they’re not properly trained or because they’re already tuned to understand their kids in cases where others might not.

“But with an automated speech therapy tool, kids and parents could get instant feedback when they practice what they’ve learned with the clinician.

“It would give children immediate and ongoing access. You can’t expect this level of attention from limited appointments with limited numbers of available pathologists.”

Dr Ahmed says speech recognition systems using a database of children’s voices could also have benefits in education.

“A lot of schools rely on getting parent volunteers to listen to children doing their reading in early education. But in schools that may have trouble getting enough parent volunteers, a child could read to a tablet or computer which could listen and correct them as they went.”

The UNSW researchers say the COVID-19 pandemic has shown just how important remote communication and learning tools are.

“Unfortunately, children have not been able to benefit from these tools as much as adults due to a lack of effective speech-based tools for remote speech therapy and learning – so they likely have not been able to get the same benefit from telehealth and tele-education tools,” Dr Ahmed says.

After the samples of 750 children have been recorded and integrated into a speech recognition system, UNSW says an open source database will be available online for other researchers to work with.

The project is expected to be complete by June 2021.

Subscribe to ITWIRE UPDATE Newsletter here

Now’s the Time for 400G Migration

The optical fibre community is anxiously awaiting the benefits that 400G capacity per wavelength will bring to existing and future fibre optic networks.

Nearly every business wants to leverage the latest in digital offerings to remain competitive in their respective markets and to provide support for fast and ever-increasing demands for data capacity. 400G is the answer.

Initial challenges are associated with supporting such project and upgrades to fulfil the promise of higher-capacity transport.

The foundation of optical networking infrastructure includes coherent optical transceivers and digital signal processing (DSP), mux/demux, ROADM, and optical amplifiers, all of which must be able to support 400G capacity.

With today’s proprietary power-hungry and high cost transceivers and DSP, how is migration to 400G networks going to be a viable option?

PacketLight's next-generation standardised solutions may be the answer. Click below to read the full article.


WEBINAR PROMOTION ON ITWIRE: It's all about webinars

These days our customers Advertising & Marketing campaigns are mainly focussed on webinars.

If you wish to promote a Webinar we recommend at least a 2 week campaign prior to your event.

The iTWire campaign will include extensive adverts on our News Site and prominent Newsletter promotion and Promotional News & Editorial.

This coupled with the new capabilities 5G brings opens up huge opportunities for both network operators and enterprise organisations.

We have a Webinar Business Booster Pack and other supportive programs.

We look forward to discussing your campaign goals with you.


Peter Dinham

Peter Dinham - an iTWire treasure is a mentor and coach who volunteers also a writer and much valued founding partner of iTWire. He is a veteran journalist and corporate communications consultant. He has worked as a journalist in all forms of media – newspapers/magazines, radio, television, press agency and now, online – including with the Canberra Times, The Examiner (Tasmania), the ABC and AAP-Reuters. As a freelance journalist he also had articles published in Australian and overseas magazines. He worked in the corporate communications/public relations sector, in-house with an airline, and as a senior executive in Australia of the world’s largest communications consultancy, Burson-Marsteller. He also ran his own communications consultancy and was a co-founder in Australia of the global photographic agency, the Image Bank (now Getty Images).

Share News tips for the iTWire Journalists? Your tip will be anonymous




Guest Opinion

Guest Interviews

Guest Reviews

Guest Research

Guest Research & Case Studies

Channel News