The SETU-DCU Submissions to IWSLT 2024 Low-Resource Speech-to-Text Translation Tasks

Maria Zafar, Antonio Castaldo, Prashanth Nayak, Rejwanul Haque, Neha Gajakos, Andy Way


Abstract
Natural Language Processing (NLP) research and development has experienced rapid progression in the recent times due to advances in deep learning. The introduction of pre-trained large language models (LLMs) is at the core of this transformation, significantly enhancing the performance of machine translation (MT) and speech technologies. This development has also led to fundamental changes in modern translation and speech tools and their methodologies. However, there remain challenges when extending this progress to underrepresented dialects and low-resource languages, primarily due to the need for more data. This paper details our submissions to the IWSLT speech translation (ST) tasks. We used the Whisper model for the automatic speech recognition (ASR) component. We then used mBART and NLLB as cascaded systems for utilising their MT capabilities. Our research primarily focused on exploring various dialects of low-resource languages and harnessing existing resources from linguistically related languages. We conducted our experiments for two morphologically diverse language pairs: Irish-to-English and Maltese-to-English. We used BLEU, chrF and COMET for evaluating our MT models.
Anthology ID:
2024.iwslt-1.12
Volume:
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–85
Language:
URL:
https://aclanthology.org/2024.iwslt-1.12
DOI:
Bibkey:
Cite (ACL):
Maria Zafar, Antonio Castaldo, Prashanth Nayak, Rejwanul Haque, Neha Gajakos, and Andy Way. 2024. The SETU-DCU Submissions to IWSLT 2024 Low-Resource Speech-to-Text Translation Tasks. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 80–85, Bangkok, Thailand (in-person and online). Association for Computational Linguistics.
Cite (Informal):
The SETU-DCU Submissions to IWSLT 2024 Low-Resource Speech-to-Text Translation Tasks (Zafar et al., IWSLT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.iwslt-1.12.pdf