Data Augmentation for Speech-Based Diacritic Restoration

Sara Shatnawi, Sawsan Alqahtani, Shady Shehata, Hanan Aldarmaki


Abstract
This paper describes a data augmentation technique for boosting the performance of speech-based diacritic restoration. Our experiments demonstrate the utility of this appraoch, resulting in improved generalization of all models across different test sets. In addition, we describe the first multi-modal diacritic restoration model, utilizing both speech and text as input modalities. This type of model can be used to diacritize speech transcripts. Unlike previous work that relies on an external ASR model, the proposed model is far more compact and efficient. While the multi-modal framework does not surpass the ASR-based model for this task, it offers a promising approach for improving the efficiency of speech-based diacritization, with a potential for improvement using data augmentation and other methods.
Anthology ID:
2024.arabicnlp-1.15
Volume:
Proceedings of The Second Arabic Natural Language Processing Conference
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Nizar Habash, Houda Bouamor, Ramy Eskander, Nadi Tomeh, Ibrahim Abu Farha, Ahmed Abdelali, Samia Touileb, Injy Hamed, Yaser Onaizan, Bashar Alhafni, Wissam Antoun, Salam Khalifa, Hatem Haddad, Imed Zitouni, Badr AlKhamissi, Rawan Almatham, Khalil Mrini
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
160–169
Language:
URL:
https://aclanthology.org/2024.arabicnlp-1.15
DOI:
Bibkey:
Cite (ACL):
Sara Shatnawi, Sawsan Alqahtani, Shady Shehata, and Hanan Aldarmaki. 2024. Data Augmentation for Speech-Based Diacritic Restoration. In Proceedings of The Second Arabic Natural Language Processing Conference, pages 160–169, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Data Augmentation for Speech-Based Diacritic Restoration (Shatnawi et al., ArabicNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.arabicnlp-1.15.pdf