Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation

Margita Šoštarić, Christian Hardmeier, Sara Stymne


Abstract
We present an analysis of a number of coreference phenomena in English-Croatian human and machine translations. The aim is to shed light on the differences in the way these structurally different languages make use of discourse information and provide insights for discourse-aware machine translation system development. The phenomena are automatically identified in parallel data using annotation produced by parsers and word alignment tools, enabling us to pinpoint patterns of interest in both languages. We make the analysis more fine-grained by including three corpora pertaining to three different registers. In a second step, we create a test set with the challenging linguistic constructions and use it to evaluate the performance of three MT systems. We show that both SMT and NMT systems struggle with handling these discourse phenomena, even though NMT tends to perform somewhat better than SMT. By providing an overview of patterns frequently occurring in actual language use, as well as by pointing out the weaknesses of current MT systems that commonly mistranslate them, we hope to contribute to the effort of resolving the issue of discourse phenomena in MT applications.
Anthology ID:
W18-6305
Volume:
Proceedings of the Third Conference on Machine Translation: Research Papers
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
36–48
Language:
URL:
https://aclanthology.org/W18-6305
DOI:
10.18653/v1/W18-6305
Bibkey:
Cite (ACL):
Margita Šoštarić, Christian Hardmeier, and Sara Stymne. 2018. Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 36–48, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation (Šoštarić et al., WMT 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6305.pdf