Code-Switching and Back-Transliteration Using a Bilingual Model

Daniel Weisberg Mitelman, Nachum Dershowitz, Kfir Bar


Abstract
The challenges of automated transliteration and code-switching–detection in Judeo-Arabic texts are addressed. We introduce two novel machine-learning models, one focused on transliterating Judeo-Arabic into Arabic, and another aimed at identifying non-Arabic words, predominantly Hebrew and Aramaic. Unlike prior work, our models are based on a bilingual Arabic-Hebrew language model, providing a unique advantage in capturing shared linguistic nuances. Evaluation results show that our models outperform prior solutions for the same tasks. As a practical contribution, we present a comprehensive pipeline capable of taking Judeo-Arabic text, identifying non-Arabic words, and then transliterating the Arabic portions into Arabic script. This work not only advances the state of the art but also offers a valuable toolset for making Judeo-Arabic texts more accessible to a broader Arabic-speaking audience.
Anthology ID:
2024.findings-eacl.102
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1501–1511
Language:
URL:
https://aclanthology.org/2024.findings-eacl.102
DOI:
Bibkey:
Cite (ACL):
Daniel Weisberg Mitelman, Nachum Dershowitz, and Kfir Bar. 2024. Code-Switching and Back-Transliteration Using a Bilingual Model. In Findings of the Association for Computational Linguistics: EACL 2024, pages 1501–1511, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Code-Switching and Back-Transliteration Using a Bilingual Model (Weisberg Mitelman et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.102.pdf