2016
pdf
bib
abs
Operational Assessment of Keyword Search on Oral History
Elizabeth Salesky
|
Jessica Ray
|
Wade Shen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.
2014
pdf
bib
abs
The MITLL-AFRL IWSLT 2014 MT system
Michaeel Kazi
|
Elizabeth Salesky
|
Brian Thompson
|
Jessica Ray
|
Michael Coury
|
Tim Anderson
|
Grant Erdmann
|
Jeremy Gwinnup
|
Katherine Young
|
Brian Ore
|
Michael Hutt
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
This report summarizes the MITLL-AFRL MT and ASR systems and the experiments run using them during the 2014 IWSLT evaluation campaign. Our MT system is much improved over last year, owing to integration of techniques such as PRO and DREM optimization, factored language models, neural network joint model rescoring, multiple phrase tables, and development set creation. We focused our eforts this year on the tasks of translating from Arabic, Russian, Chinese, and Farsi into English, as well as translating from English to French. ASR performance also improved, partly due to increased eforts with deep neural networks for hybrid and tandem systems. Work focused on both the English and Italian ASR tasks.
2013
pdf
bib
abs
The MIT-LL/AFRL IWSLT-2013 MT system
Michaeel Kazi
|
Michael Coury
|
Elizabeth Salesky
|
Jessica Ray
|
Wade Shen
|
Terry Gleason
|
Tim Anderson
|
Grant Erdmann
|
Lane Schwartz
|
Brian Ore
|
Raymond Slyh
|
Jeremy Gwinnup
|
Katherine Young
|
Michael Hutt
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2013 evaluation campaign [1]. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Russian to English, Chinese to English, Arabic to English, and English to French TED-talk translation task. We also applied our existing ASR system to the TED-talk lecture ASR task. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2012 system, and experiments we ran during the IWSLT-2013 evaluation. Specifically, we focus on 1) cross-entropy filtering of MT training data, and 2) improved optimization techniques, 3) language modeling, and 4) approximation of out-of-vocabulary words.