Mukhtar Mohamed
2024
From Nile Sands to Digital Hands: Machine Translation of Coptic Texts
Muhammed Saeed
|
Asim Mohamed
|
Mukhtar Mohamed
|
Shady Shehata
|
Muhammad Abdul-Mageed
Proceedings of The Second Arabic Natural Language Processing Conference
The Coptic language, rooted in the historical landscapes of Egypt, continues to serve as a vital liturgical medium for the Coptic Orthodox and Catholic Churches across Egypt, North Sudan, Libya, and the United States, with approximately ten million speakers worldwide. However, the scarcity of digital resources in Coptic has resulted in its exclusion from digital systems, thereby limiting its accessibility and preservation in modern technological contexts. Our research addresses this issue by developing the most extensive parallel Coptic-centered corpus to date. This corpus comprises over 8,000 parallel sentences between Arabic and Coptic, and more than 24,000 parallel sentences between English and Coptic. We have also developed the first neural machine translation system between Coptic, English, and Arabic. Lastly, we evaluate the capability of leading proprietary Large Language Models (LLMs) to translate to and from Coptic using a few-shot learning approach (in-context learning). Our code and data are available at https://github.com/UBC-NLP/copticmt.
Search