Data Augmentation for Maltese NLP using Transliterated and Machine Translated Arabic Data

Kurt Micallef, Nizar Habash, Claudia Borg


Abstract
Maltese is a unique Semitic language that has evolved under extensive influence from Romance and Germanic languages, particularly Italian and English. Despite its Semitic roots, its orthography is based on the Latin script, creating a gap between it and its closest linguistic relatives in Arabic. In this paper, we explore whether Arabic-language resources can support Maltese natural language processing (NLP) through cross-lingual augmentation techniques. We investigate multiple strategies for aligning Arabic textual data with Maltese, including various transliteration schemes and machine translation (MT) approaches. As part of this, we also introduce novel transliteration systems that better represent Maltese orthography. We evaluate the impact of these augmentations on monolingual and mutlilingual models and demonstrate that Arabic-based augmentation can significantly benefit Maltese NLP tasks.
Anthology ID:
2025.findings-emnlp.1177
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21580–21590
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1177/
DOI:
Bibkey:
Cite (ACL):
Kurt Micallef, Nizar Habash, and Claudia Borg. 2025. Data Augmentation for Maltese NLP using Transliterated and Machine Translated Arabic Data. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 21580–21590, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Data Augmentation for Maltese NLP using Transliterated and Machine Translated Arabic Data (Micallef et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1177.pdf
Checklist:
 2025.findings-emnlp.1177.checklist.pdf