Smarter Flexible Speech Recognition
Closer to the Human Brain Recognising Speech directly from the Audio
Smarter Flexible Speech Recognition
Closer to the Human Brain Recognising Speech directly from the Audio
Signed in as:
filler@godaddy.com
Closer to the Human Brain Recognising Speech directly from the Audio
Closer to the Human Brain Recognising Speech directly from the Audio
Professor Lahiri has held two ERC Advanced Grants in addition to two ERC Proof of Concept grants for work in this area, to carry out the fundamental investigation of variations in speech, leading to a linguistic model of speech based on phonological features, the articulatory and acoustic properties of each sound that form its contrasts with others. For example, the ‘voicing’ feature (whether the vocal cords are vibrating or not) forms a component of the contrast between the ‘p’ and ‘b’ consonant sounds in English.
Using the funding, the team developed the speech recognition system that was trained to recognise a universal set of 19 such features and can combine them to identify speech sounds, or phones. Importantly, it targets those features, essential to human understanding of speech, and ignores or tolerates those that can vary across speakers or utterances.
The research team has also used this model to develop a language learning app. which analyses words and sentences spoken by the user, and provides detailed feedback. Used in this way, language learners can receive personalised responses to improve their pronunciation.
The novel inventions for Automatic Speech Recognition and a System for Automatic Speech Analysis
INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
World Intellectual Property
Organization
Front Line operatives face dangerous situations, where their hands may not be free to operate body-worn or mobile devices, or push an emergency call button. In these situations, FlexSR can be running on a device and monitoring real-time speech to recognise trigger words to then initiate a broadcast for assistance. FlexSR can be managed without specialist training. It can run on body-worn video cameras or radios, or a smartphone directly or as a companion device to the camera or radio
Hostile environment where the user cannot look away from and hands and fingers may be gloved for protection, and keyboard interfaces are not practical, or it would not be safe to look down at controls. The environment would require real-time control on a device without the latency risk of using cloud. It may require in-field operational application without specialist skills training. It requires minimal compute resource and can run on modern wearable devices built for Industrial use or Military
Surveillance of speech is a huge challenge for conventional ASR, when different accents spoken by non-native or regional accent speakers are speaking different languages, as they require a model to be trained for each and every combination. FlexSR does not require models to be built or trained. It can match words and phrases to any lexicon represented in the IPA and overlay the different accents to improve accuracy. It can rapidly recognise words without needing to perform an entire speech-to-text transcription
Voice is still used to make requests or orders for products on brokers and advisors or to make price bids and offers of products. Human errors can occur when translating the information into an electronic system. Speech recognition can be effective in verification of any manual entry before any order execution. Identifying single words or short phrases is a challenge for conventional systems trained on models. FlexSR recognises the words' linguistic features and phrases directly from the signal and matches them to the phonological expressions in the lexicon of words of interest. This enables quick and easy construction of the lexicon required and can accommodate accents of speakers, without the need for model building or training. outcomes.
Adults often experience difficulties in learning and even perceiving new sounds that are not present in their native language. FlexSR can apply a target phrase with phonemes with known features and compare to the users' attempt at saying it and detect any mispronunciations and give feedback. It uses the novel techniques of FlexSR to recognise the features directly from the speech input signal. This can be used by developers of self-service language learning systems. In addition, this capability can be used to store the native speaker pronunciations to apply to the FlexSR automatic speech recognition system where a speaker is speaking a non-native language, or has a strong regional accent that deviates from the standard.
Conventionally, ASR transcription services rely on building models for each language which requires 250+ hours of speech to give the benchmark to train the acoustic models. This is timely and expensive and doesn’t allow for non-native speaker pronunciations. Language models for non-native speakers need to be built too. FlexSR recognises the Linguistic features directly from the speech signal and matches them directly to the phonological features in the Lexicon without any acoustic modelling. It can overly non-native speaker accents too from our pronunciation solution. This enables any combination of languages and accents to be built with massively lower cost, over a very short time giving greater time to market on new transcription product offerings.
Copyright © 2024 FlexSR - All Rights Reserved.
Powered by GoDaddy