Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

Malina Chichirau, Rik van Noord, Antonio Toral


Abstract
We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German–English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian and Chinese) tends to improve the accuracy of both monolingual and multilingual classifiers. Finally, we show that bilingual classifiers and classifiers trained on multiple source languages benefit from being trained on longer text sequences, rather than on sentences.
Anthology ID:
2023.eamt-1.21
Volume:
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Month:
June
Year:
2023
Address:
Tampere, Finland
Editors:
Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
217–226
Language:
URL:
https://aclanthology.org/2023.eamt-1.21
DOI:
Bibkey:
Cite (ACL):
Malina Chichirau, Rik van Noord, and Antonio Toral. 2023. Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 217–226, Tampere, Finland. European Association for Machine Translation.
Cite (Informal):
Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios (Chichirau et al., EAMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eamt-1.21.pdf