Corpus Creation and Evaluation for Speech-to-Text and Speech Translation

Corey Miller, Evelyne Tzoukermann, Jennifer Doyon, Elizabeth Mallard


Abstract
The National Virtual Translation Center (NVTC) seeks to acquire human language technology (HLT) tools that will facilitate its mission to provide verbatim English translations of foreign language audio and video files. In the text domain, NVTC has been using translation memory (TM) for some time and has reported on the incorporation of machine translation (MT) into that workflow (Miller et al., 2020). While we have explored the use of speech-totext (STT) and speech translation (ST) in the past (Tzoukermann and Miller, 2018), we have now invested in the creation of a substantial human-made corpus to thoroughly evaluate alternatives. Results from our analysis of this corpus and the performance of HLT tools point the way to the most promising ones to deploy in our workflow.
Anthology ID:
2021.mtsummit-up.6
Volume:
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Month:
August
Year:
2021
Address:
Virtual
Editors:
Janice Campbell, Ben Huyck, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
Venue:
MTSummit
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
44–53
Language:
URL:
https://aclanthology.org/2021.mtsummit-up.6
DOI:
Bibkey:
Cite (ACL):
Corey Miller, Evelyne Tzoukermann, Jennifer Doyon, and Elizabeth Mallard. 2021. Corpus Creation and Evaluation for Speech-to-Text and Speech Translation. In Proceedings of Machine Translation Summit XVIII: Users and Providers Track, pages 44–53, Virtual. Association for Machine Translation in the Americas.
Cite (Informal):
Corpus Creation and Evaluation for Speech-to-Text and Speech Translation (Miller et al., MTSummit 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mtsummit-up.6.pdf