Tandem Long-Short Duration-based Modeling for Automatic Speech Recognition

Dalai Mengke, Yan Meng, Peter Mihajlik


Abstract
This study outlines our duration-dependent modeling experiments on limited-resource Hungarian speech recognition tasks. As it is well known, very short utterances pose significant challenges in automatic speech recognition due to the lack of context and other phenomena. In particular, we found that that the exclusion of shorter speech samples from fine-tuning for longer duration test data significantly improves the recognition rate measured on public Hungarian datasets, BEA-Base and CommonVoice (CV). Therefore we apply a tandem modeling approach, separate models are used for short and long duration test data. Our strategy improved the ability to recognize short utterances while maintaining recognition of long utterances efficiently, which led to a significant increase in overall recognition accuracy.
Anthology ID:
2024.sigul-1.40
Volume:
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venues:
SIGUL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
331–336
Language:
URL:
https://aclanthology.org/2024.sigul-1.40
DOI:
Bibkey:
Cite (ACL):
Dalai Mengke, Yan Meng, and Peter Mihajlik. 2024. Tandem Long-Short Duration-based Modeling for Automatic Speech Recognition. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 331–336, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Tandem Long-Short Duration-based Modeling for Automatic Speech Recognition (Mengke et al., SIGUL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigul-1.40.pdf