Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus

Yves Bestgen


Abstract
A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words. These results were obtained on the basis of translations of quality newspaper articles in which human translations can be thought to be not very literal. The present study attempts to replicate this research using a parliamentary corpus. The results confirm the observations on the news corpus, but the differences are less strong. They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations.
Anthology ID:
2022.parlaclarin-1.14
Volume:
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Darja Fišer, Maria Eskevich, Jakob Lenardič, Franciska de Jong
Venue:
ParlaCLARIN
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
101–106
Language:
URL:
https://aclanthology.org/2022.parlaclarin-1.14
DOI:
Bibkey:
Cite (ACL):
Yves Bestgen. 2022. Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus. In Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference, pages 101–106, Marseille, France. European Language Resources Association.
Cite (Informal):
Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus (Bestgen, ParlaCLARIN 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.parlaclarin-1.14.pdf