Machine Translation of Omani Arabic Dialect from Social Media

Khoula Al-Kharusi, Abdurahman AAlAbdulsalam


Abstract
Research studies on Machine Translation (MT) between Modern Standard Arabic (MSA) and English are abundant. However, studies on MT between Omani Arabic (OA) dialects and English are very scarce. This research study focuses on the lack of availability of an Omani dialect parallel dataset, as well as MT of OA to English. The study uses social media data from X (formerly Twitter) to build an authentic parallel text of the Omani dialects. The research presents baseline results on this dataset using Google Translate, Microsoft Translation, and Marian NMT. A taxonomy of the most common linguistic errors is used to analyze the translations made by the NMT systems to provide insights on future improvements. Finally, transfer learning is used to adapt Marian NMT to the Omani dialect, which significantly improved by 9.88 points in the BLEU score.
Anthology ID:
2023.arabicnlp-1.24
Volume:
Proceedings of ArabicNLP 2023
Month:
December
Year:
2023
Address:
Singapore (Hybrid)
Editors:
Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
302–309
Language:
URL:
https://aclanthology.org/2023.arabicnlp-1.24
DOI:
10.18653/v1/2023.arabicnlp-1.24
Bibkey:
Cite (ACL):
Khoula Al-Kharusi and Abdurahman AAlAbdulsalam. 2023. Machine Translation of Omani Arabic Dialect from Social Media. In Proceedings of ArabicNLP 2023, pages 302–309, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Machine Translation of Omani Arabic Dialect from Social Media (Al-Kharusi & AAlAbdulsalam, ArabicNLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.arabicnlp-1.24.pdf
Video:
 https://aclanthology.org/2023.arabicnlp-1.24.mp4