Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System

İlknur Durgar El-Kahlout, Emre Bektaş, Naime Şeyma Erdem, Hamza Kaya


Abstract
This paper introduces the work on building a machine translation system for Arabic-to-Turkish in the news domain. Our work includes collecting parallel datasets in several ways for a new and low-resourced language pair, building baseline systems with state-of-the-art architectures and developing language specific algorithms for better translation. Parallel datasets are mainly collected three different ways; i) translating Arabic texts into Turkish by professional translators, ii) exploiting the web for open-source Arabic-Turkish parallel texts, iii) using back-translation. We per-formed preliminary experiments for Arabic-to-Turkish machine translation with neural(Marian) machine translation tools with a novel morphologically motivated vocabulary reduction method.
Anthology ID:
W19-4617
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
158–166
Language:
URL:
https://aclanthology.org/W19-4617
DOI:
10.18653/v1/W19-4617
Bibkey:
Cite (ACL):
İlknur Durgar El-Kahlout, Emre Bektaş, Naime Şeyma Erdem, and Hamza Kaya. 2019. Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 158–166, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System (Durgar El-Kahlout et al., WANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4617.pdf
Data
OpenSubtitles