Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian

Peiqin Lin; Marion Thaler; Daniela Goschala; Amir Hossein Kargaran; Yihong Liu; André F. T. Martins; Hinrich Schütze

doi:10.18653/v1/2025.sigtyp-1.13

Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian

Peiqin Lin, Marion Thaler, Daniela Goschala, Amir Hossein Kargaran, Yihong Liu, André F. T. Martins, Hinrich Schütze

Abstract

When translating into a low-resource language, a language model can have a tendency to produce translations that are close to the source (e.g., word-by-word translations) due to a lack of rich low-resource training data in pretraining. Thus, the output often is translationese that differs considerably from what native speakers would produce naturally. To remedy this, we synthetically create a training set in which the frequency of a construction unique to the low-resource language is artificially inflated. For the case of Bavarian, we show that, after training, the language model has learned the unique construction and that native speakers judge its output as more natural. Our pilot study suggests that construction-based mitigation of translationese is a promising approach. Code and artifacts are available at https://github.com/cisnlp/BayernGPT.

Anthology ID:: 2025.sigtyp-1.13
Volume:: Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Michael Hahn, Priya Rani, Ritesh Kumar, Andreas Shcherbakov, Alexey Sorokin, Oleg Serikov, Ryan Cotterell, Ekaterina Vylomova
Venues:: SIGTYP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 114–121
Language:
URL:: https://aclanthology.org/2025.sigtyp-1.13/
DOI:: 10.18653/v1/2025.sigtyp-1.13
Bibkey:
Cite (ACL):: Peiqin Lin, Marion Thaler, Daniela Goschala, Amir Hossein Kargaran, Yihong Liu, André F. T. Martins, and Hinrich Schütze. 2025. Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian. In Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 114–121, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian (Lin et al., SIGTYP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.sigtyp-1.13.pdf

PDF Cite Search Fix data