Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features

Markus Müller, Sebastian Stüker, Alex Waibel


Abstract
In an increasingly globalized world, there is a rising demand for speech recognition systems. Systems for languages like English, German or French do achieve a decent performance, but there exists a long tail of languages for which such systems do not yet exist. State-of-the-art speech recognition systems feature Deep Neural Networks (DNNs). Being a data driven method and therefore highly dependent on sufficient training data, the lack of resources directly affects the recognition performance. There exist multiple techniques to deal with such resource constraint conditions, one approach is the use of additional data from other languages. In the past, is was demonstrated that multilingually trained systems benefit from adding language feature vectors (LFVs) to the input features, similar to i-Vectors. In this work, we extend this approach by the addition of articulatory features (AFs). We show that AFs also benefit from LFVs and that multilingual system setups benefit from adding both AFs and LFVs. Pretending English to be a low-resource language, we restricted ourselves to use only 10h of English acoustic training data. For system training, we use additional data from French, German and Turkish. By using a combination of AFs and LFVs, we were able to decrease the WER from 18.1% to 17.3% after system combination in our setup using a multilingual phone set.
Anthology ID:
2016.iwslt-1.9
Volume:
Proceedings of the 13th International Conference on Spoken Language Translation
Month:
December 8-9
Year:
2016
Address:
Seattle, Washington D.C
Editors:
Mauro Cettolo, Jan Niehues, Sebastian Stüker, Luisa Bentivogli, Rolando Cattoni, Marcello Federico
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
International Workshop on Spoken Language Translation
Note:
Pages:
Language:
URL:
https://aclanthology.org/2016.iwslt-1.9
DOI:
Bibkey:
Cite (ACL):
Markus Müller, Sebastian Stüker, and Alex Waibel. 2016. Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features. In Proceedings of the 13th International Conference on Spoken Language Translation, Seattle, Washington D.C. International Workshop on Spoken Language Translation.
Cite (Informal):
Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features (Müller et al., IWSLT 2016)
Copy Citation:
PDF:
https://aclanthology.org/2016.iwslt-1.9.pdf