Dana Delgado


2020

pdf bib
The SAFE-T Corpus: A New Resource for Simulated Public Safety Communications
Dana Delgado | Kevin Walker | Stephanie Strassel | Karen Jones | Christopher Caruso | David Graff
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce a new resource, the SAFE-T (Speech Analysis for Emergency Response Technology) Corpus, designed to simulate first-responder communications by inducing high vocal effort and urgent speech with situational background noise in a game-based collection protocol. Linguistic Data Consortium developed the SAFE-T Corpus to support the NIST (National Institute of Standards and Technology) OpenSAT (Speech Analytic Technologies) evaluation series, whose goal is to advance speech analytic technologies including automatic speech recognition, speech activity detection and keyword search in multiple domains including simulated public safety communications data. The corpus comprises over 300 hours of audio from 115 unique speakers engaged in a collaborative problem-solving activity representative of public safety communications in terms of speech content, noise types and noise levels. Portions of the corpus have been used in the OpenSAT 2019 evaluation and the full corpus will be published in the LDC catalog. We describe the design and implementation of the SAFE-T Corpus collection, discuss the approach of capturing spontaneous speech from study participants through game-based speech collection, and report on the collection results including several challenges associated with the collection.

2019

pdf bib
Corpus Building for Low Resource Languages in the DARPA LORELEI Program
Jennifer Tracey | Stephanie Strassel | Ann Bies | Zhiyi Song | Michael Arrigo | Kira Griffitt | Dana Delgado | Dave Graff | Seth Kulick | Justin Mott | Neil Kuster
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages