Machine Translated Text Detection Through Text Similarity with Round-Trip Translation

Hoang-Quoc Nguyen-Son, Tran Thao, Seira Hidano, Ishita Gupta, Shinsaku Kiyomoto


Abstract
Translated texts have been used for malicious purposes, i.e., plagiarism or fake reviews. Existing detectors have been built around a specific translator (e.g., Google) but fail to detect a translated text from a strange translator. If we use the same translator, the translated text is similar to its round-trip translation, which is when text is translated into another language and translated back into the original language. However, a round-trip translated text is significantly different from the original text or a translated text using a strange translator. Hence, we propose a detector using text similarity with round-trip translation (TSRT). TSRT achieves 86.9% accuracy in detecting a translated text from a strange translator. It outperforms existing detectors (77.9%) and human recognition (53.3%).
Anthology ID:
2021.naacl-main.462
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5792–5797
Language:
URL:
https://aclanthology.org/2021.naacl-main.462
DOI:
10.18653/v1/2021.naacl-main.462
Bibkey:
Cite (ACL):
Hoang-Quoc Nguyen-Son, Tran Thao, Seira Hidano, Ishita Gupta, and Shinsaku Kiyomoto. 2021. Machine Translated Text Detection Through Text Similarity with Round-Trip Translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5792–5797, Online. Association for Computational Linguistics.
Cite (Informal):
Machine Translated Text Detection Through Text Similarity with Round-Trip Translation (Nguyen-Son et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.462.pdf
Video:
 https://aclanthology.org/2021.naacl-main.462.mp4
Code
 quocnsh/machine_translation_detection