The RATS Collection: Supporting HLT Research with Degraded Audio Data

David Graff; Kevin Walker; Stephanie Strassel; Xiaoyi Ma; Karen Spärck Jones; Ann Sawyer

The RATS Collection: Supporting HLT Research with Degraded Audio Data

David Graff, Kevin Walker, Stephanie Strassel, Xiaoyi Ma, Karen Jones, Ann Sawyer

Abstract

The DARPA RATS program was established to foster development of language technology systems that can perform well on speaker-to-speaker communications over radio channels that evince a wide range in the type and extent of signal variability and acoustic degradation. Creating suitable corpora to address this need poses an equally wide range of challenges for the collection, annotation and quality assessment of relevant data. This paper describes the LDCs multi-year effort to build the RATS data collection, summarizes the content and properties of the resulting corpora, and discusses the novel problems and approaches involved in ensuring that the data would satisfy its intended use, to provide speech recordings and annotations for training and evaluating HLT systems that perform 4 specific tasks on difficult radio channels: Speech Activity Detection (SAD), Language Identification (LID), Speaker Identification (SID) and Keyword Spotting (KWS).

Anthology ID:: L14-1089
Volume:: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:: May
Year:: 2014
Address:: Reykjavik, Iceland
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 1970–1977
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1125_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: David Graff, Kevin Walker, Stephanie Strassel, Xiaoyi Ma, Karen Jones, and Ann Sawyer. 2014. The RATS Collection: Supporting HLT Research with Degraded Audio Data. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1970–1977, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):: The RATS Collection: Supporting HLT Research with Degraded Audio Data (Graff et al., LREC 2014)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1125_Paper.pdf

PDF Cite Search Fix data