LiSTra Automatic Speech Translation: English to Lingala Case Study

Salomon Kabongo Kabenamualu, Vukosi Marivate, Herman Kamper


Abstract
In recent years there has been great interest in addressing the data scarcity of African languages and providing baseline models for different Natural Language Processing tasks (Orife et al., 2020). Several initiatives (Nekoto et al., 2020) on the continent uses the Bible as a data source to provide proof of concept for some NLP tasks. In this work, we present the Lingala Speech Translation (LiSTra) dataset, release a full pipeline for the construction of such dataset in other languages, and report baselines using both the traditional cascade approach (Automatic Speech Recognition - Machine Translation), and a revolutionary transformer based End-2-End architecture (Liu et al., 2020) with a custom interactive attention that allows information sharing between the recognition decoder and the translation decoder.
Anthology ID:
2022.dclrl-1.8
Volume:
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Jonne Sälevä, Constantine Lignos
Venue:
DCLRL
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
63–67
Language:
URL:
https://aclanthology.org/2022.dclrl-1.8
DOI:
Bibkey:
Cite (ACL):
Salomon Kabongo Kabenamualu, Vukosi Marivate, and Herman Kamper. 2022. LiSTra Automatic Speech Translation: English to Lingala Case Study. In Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference, pages 63–67, Marseille, France. European Language Resources Association.
Cite (Informal):
LiSTra Automatic Speech Translation: English to Lingala Case Study (Kabongo Kabenamualu et al., DCLRL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.dclrl-1.8.pdf
Data
JW300