Language Related Issues for Machine Translation between Closely Related South Slavic Languages

Maja Popović, Mihael Arčan, Filip Klubička


Abstract
Machine translation between closely related languages is less challenging and exibits a smaller number of translation errors than translation between distant languages, but there are still obstacles which should be addressed in order to improve such systems. This work explores the obstacles for machine translation systems between closely related South Slavic languages, namely Croatian, Serbian and Slovenian. Statistical systems for all language pairs and translation directions are trained using parallel texts from different domains, however mainly on spoken language i.e. subtitles. For translation between Serbian and Croatian, a rule-based system is also explored. It is shown that for all language pairs and translation systems, the main obstacles are differences between structural properties.
Anthology ID:
W16-4806
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
43–52
Language:
URL:
https://aclanthology.org/W16-4806
DOI:
Bibkey:
Cite (ACL):
Maja Popović, Mihael Arčan, and Filip Klubička. 2016. Language Related Issues for Machine Translation between Closely Related South Slavic Languages. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 43–52, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Language Related Issues for Machine Translation between Closely Related South Slavic Languages (Popović et al., VarDial 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4806.pdf