Operational Assessment of Keyword Search on Oral History

Elizabeth Salesky, Jessica Ray, Wade Shen


Abstract
This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.
Anthology ID:
L16-1049
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
317–321
Language:
URL:
https://aclanthology.org/L16-1049/
DOI:
Bibkey:
Cite (ACL):
Elizabeth Salesky, Jessica Ray, and Wade Shen. 2016. Operational Assessment of Keyword Search on Oral History. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 317–321, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Operational Assessment of Keyword Search on Oral History (Salesky et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1049.pdf