A Novel Application of Speech Recognition
We are applying Automatic Speech Recognition (ASR) in a novel way to a novel service and we would like help to take it to the next level. As a relatively small company, we have limited R&D resources and we are concentrating those efforts on the artificial intelligence aspects of our service. So we would like to collaborate with a company or organisation that is focussed on developing ASR systems.
Even if we had unlimited R&D resources, there would be no point in conducting fundamental research on ASR when there are already existing systems that would meet our needs. And with time, these will get even better. The realisation that ASR could be fruitfully applied arose largely because of the author’s background in ASR and his early work in quantifying the importance of linguistic context in maximising ASR performance.
What we have developed a service which aggregates all of an organisation’s digital messages - emails, telephone calls and other types of message - and stores them in a Cloud database. By combining messages from many contacts and sources, inferring what is private and what may be shared, we can identify events and trends that could not otherwise be identified from say, a single employee’s email inbox. This work is described in our IET paper “Untangling your Threads - a novel cloud computing application”
We wish to extend our service to permit users to search for keywords in telephone calls - which are subject to the normal telephony constraints of bandwidth and signal/noise ratio, etc. Since we know the identities of participants and can correlate their messages across different media, we have a massive amount of contextual information to input to our ASR model - something not always available to general ASR applications.
We do not require verbatim speech transcription, nor identifying keywords with low information value. We are able to create vocabularies of keywords and their probabilities of being uttered. We do not need to perform the ASR in real time, nor host/own the ASR algorithms deployed.
We have done experimental work processing utterances using various commercial and open source ASR products on our own corpus of 10,000 phone calls and the ENRON corpus. This work shows good promise and is described in our IoA paper "The contribution of automatic speech recognition for keywords to assist in the integrated organisation of digital messages."
There may already be published work on a similar application, but if so, we haven’t found any. We do not wish to reinvent the wheel, so we’d like to hear from anyone in the ASR community who might be able to point us to relevant work or who is interested in working with us - either in providing a service or licensing code.
In summary, our application involves:
- Identifying high information value keywords in continuous telephone speech (initially English).
- Single known speaker and possibly single unknown speaker - separated on full duplex channel.
- Training may be possible for some speakers.
- Licensing of ASR code or software as a service.
- Real-time operation is a NON-requirement.
As an aside, our service also provides an almost ideal application for Voice Recognition (the term is use here correctly to mean understanding who is speaking rather than what they are saying). Again, we have made good progress in this and hope to publish our results at a later date.
If you are interested in collaborating with us or can provide any other help or advice then please do connect via Linkedin or email "threads_asr@jpy.com"
John Yardley