GTNC: A Many-To-One Dataset of Google Translations from NewsCrawl
Damiaan Reijnaers | Charlotte Pouw
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

This paper lays the groundwork for initiating research into Source Language Identification; the task of identifying the original language of a machine-translated text. We contribute a dataset of translations from a typologically diverse spectrum of languages into English and use it to set initial baselines for this novel task.


Machine-translated texts from English to Polish show a potential for typological explanations in Source Language Identification
Damiaan Reijnaers | Elize Herrewijnen
Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)

This work examines a case study that investigates (1) the achievability of extracting typological features from Polish texts, and (2) their contrastive power to discriminate between machine-translated texts from English. The findings indicate potential for a proposed method that deals with the explainable prediction of the source language of translated texts.