Intelligence experts ask for speech-recognition software that works in noisy, echo-ridden rooms
WASHINGTON, 19 Nov. 2014. U.S. intelligence experts are asking industry, colleges, and the public to design speech-recognition software that can decipher conversations and other speech that happens over microphones in noisy, echo-ridden environments.
Officials of the Intelligence Advanced Research Projects Agency (IARPA) in Washington are launching the Automatic Speech recognition in Reverberant Environments (ASpIRE) speech-recognition contest.
ASpIRE seeks to enable automatic speech recognition technology to perform well across a variety of acoustic environments and recording scenarios on natural conversational speech.
"Building speech recognition systems that work well on speech recorded in mismatched noisy, reverberant environments is essential for many Intelligence Community (IC) missions," explains Mary Harper, the ASpIRE program manager at IARPA, which is the research arm of the U.S. Director of National Intelligence.
The ASpIRE challenge seeks speech-recognition systems that can be trained on conversational telephone speech, yet work well on far-field microphone data from noisy, reverberant rooms. The program is a spinoff of the IARPA Babel program to develop technology that works on any human language.
Participants can download sample data on which to test algorithms that represent microphone recordings in real rooms. Participants must submit transcriptions for the test set.
Previous work has shown that automatic speech recognition (ASR) performance degrades in room microphone conditions, especially when data used for training is mismatched with data used in testing.
The ASpIRE challenge asks for approaches to mitigate the effects of these conditions and create software that can function in many acoustic environments and recording scenarios. Participants can address either a single microphone or multi microphone scenario.
When the challenge begins, 15 hours of data (divided into a development set and development-test set) will be posted on the challenge website. These data, which consist of multi microphone recordings of conversational speech with transcriptions, are meant to be used for optimization, training selection, and tuning. At any time during this period, solvers may run their software against the data and revise their solutions.
During the evaluation period, participants will be given 10 hours of new transcribed far-field microphone data from noisy, reverberant rooms.
The algorithm that produces the lowest word error rate in the single microphone condition will receive $30,000, and the algorithm that produces the lowest word error rate in the multiple microphone condition will receive $20,000.
The single microphone condition evaluation is 4-11 Feb. 2015, while the multiple microphone evaluation is 12-19 Feb. 2015. Anyone at least 18 years old is eligible to participate, and IARPA officials are expecting analysts, natural language processing specialists, machine learning programmers, and others to take part.