SYSTRAN @ WMT24 Non-Repetitive Translation Task

Marko Avila, Josep Crego


Abstract
Many contemporary NLP systems rely on neural decoders for text generation, which demonstrate an impressive ability to generate text approaching human fluency levels. However, in the case of neural machine translation networks, they often grapple with the production of repetitive content, also known as repetitive diction or word repetition, an aspect they weren’t explicitly trained to address. While not inherently negative, this repetition can make writing seem monotonous or awkward if not used intentionally for emphasis or stylistic purposes. This paper presents our submission to the WMT 2024 Non-Repetitive Translation Task, for which we adopt a repetition penalty method applied at learning inspired by the principles of label smoothing. No additional work is needed at inference time. We modify the ground-truth distribution to steer the model towards discouraging repetitions. Experiments show the ability of the proposed methods in reducing repetitions within neural machine translation engines, without compromising efficiency or translation quality.
Anthology ID:
2024.wmt-1.108
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1056–1062
Language:
URL:
https://aclanthology.org/2024.wmt-1.108
DOI:
Bibkey:
Cite (ACL):
Marko Avila and Josep Crego. 2024. SYSTRAN @ WMT24 Non-Repetitive Translation Task. In Proceedings of the Ninth Conference on Machine Translation, pages 1056–1062, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
SYSTRAN @ WMT24 Non-Repetitive Translation Task (Avila & Crego, WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.108.pdf