A Novel Application of Speech Recognition

John Yardley

Published Mar 3, 2017

We are applying Automatic Speech Recognition (ASR) in a novel way to a novel service and we would like help to take it to the next level. As a relatively small company, we have limited R&D resources and we are concentrating those efforts on the artificial intelligence aspects of our service. So we would like to collaborate with a company or organisation that is focussed on developing ASR systems.

Even if we had unlimited R&D resources, there would be no point in conducting fundamental research on ASR when there are already existing systems that would meet our needs. And with time, these will get even better. The realisation that ASR could be fruitfully applied arose largely because of the author’s background in ASR and his early work in quantifying the importance of linguistic context in maximising ASR performance.

What we have developed a service which aggregates all of an organisation’s digital messages - emails, telephone calls and other types of message - and stores them in a Cloud database. By combining messages from many contacts and sources, inferring what is private and what may be shared, we can identify events and trends that could not otherwise be identified from say, a single employee’s email inbox. This work is described in our IET paper “Untangling your Threads - a novel cloud computing application”

We wish to extend our service to permit users to search for keywords in telephone calls - which are subject to the normal telephony constraints of bandwidth and signal/noise ratio, etc. Since we know the identities of participants and can correlate their messages across different media, we have a massive amount of contextual information to input to our ASR model - something not always available to general ASR applications.

We do not require verbatim speech transcription, nor identifying keywords with low information value. We are able to create vocabularies of keywords and their probabilities of being uttered. We do not need to perform the ASR in real time, nor host/own the ASR algorithms deployed.

We have done experimental work processing utterances using various commercial and open source ASR products on our own corpus of 10,000 phone calls and the ENRON corpus. This work shows good promise and is described in our IoA paper "The contribution of automatic speech recognition for keywords to assist in the integrated organisation of digital messages."

There may already be published work on a similar application, but if so, we haven’t found any. We do not wish to reinvent the wheel, so we’d like to hear from anyone in the ASR community who might be able to point us to relevant work or who is interested in working with us - either in providing a service or licensing code.

In summary, our application involves:

Identifying high information value keywords in continuous telephone speech (initially English).
Single known speaker and possibly single unknown speaker - separated on full duplex channel.
Training may be possible for some speakers.
Licensing of ASR code or software as a service.
Real-time operation is a NON-requirement.

As an aside, our service also provides an almost ideal application for Voice Recognition (the term is use here correctly to mean understanding who is speaking rather than what they are saying). Again, we have made good progress in this and hope to publish our results at a later date.

If you are interested in collaborating with us or can provide any other help or advice then please do connect via Linkedin or email "threads_asr@jpy.com"

A Novel Application of Speech Recognition

John Yardley

More articles by John Yardley

Insights from the community

Others also viewed

Evaluating System Performance: An Overview of SECS, MOS, and Sim-MOS Metrics for Speech, Audio, and Multimodality Large Language Models

LLMs for Dialogue Systems: Transforming Human-Computer Interaction

Automated Speech Recognition Approaches And Challenges

How Data Annotation is used for Speech Recognition

Speech to Text/Automated Speech Recognition and Conversational AI

New Advancements in Spoken Language Processing

Cost effective telco call transcription by scaling Whisper at Virgin Media O2

Inside Whisper, an open-source audio model

OpenAI's Whisper: The Future of Offline Speech Recognition

🗣️ My personal journey with speech technology: Exploring the evolution and new players shaping the future 🚀

Explore topics

More articles by John Yardley

What Our Messages Tell Us About Our Behaviour

What’s the difference between a shared inbox and sharing an inbox?

Enterprise CRM application Threads is one of Financial Times ‘Europe 100 Digital Champions’

Phone Call Transcription on Demand

3 Ways To Use Your Call Recordings To Increase Productivity

Artificial Intelligence, Law and the King’s New Clothes.

Sharing is preparing - how a common message hub can support your GDPR compliance.

We’ve done it: Threads® now lets you search your scanned attachments AND phone calls - yes, phone calls.

2.9 billion reasons why email isn’t going away

Dark matter, dark energy and dark communications