Garnishing a phonetic dictionary for ASR intake

Iben Nyholm Debess, Sandra Saxov Lamhauge, Peter Juel Henrichsen


Abstract
We present a new method for preparing a lexical-phonetic database as a resource for acoustic model training. The research is an offshoot of the ongoing Project Ravnur (Speech Recognition for Faroese), but the method is language-independent. At NODALIDA 2019 we demonstrate the method (called SHARP) online, showing how a traditional lexical-phonetic dictionary (with a very rich phone inventory) is transformed into an ASR-friendly database (with reduced phonetics, preventing data sparseness). The mapping procedure is informed by a corpus of speech transcripts. We conclude with a discussion on the benefits of a well-thought-out BLARK design (Basic Language Resource Kit), making tools like SHARP possible.
Anthology ID:
W19-6147
Volume:
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Month:
September–October
Year:
2019
Address:
Turku, Finland
Venues:
NoDaLiDa | WS
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
395–399
Language:
URL:
https://aclanthology.org/W19-6147
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/W19-6147.pdf