Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Sourabh Deoghare; Diptesh Kanojia; Pushpak Bhattacharyya

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya

Abstract

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by 2.5 and 2.39 TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning (+1.29 and +1.44 TER points), data augmentation (+0.53 and +0.45 TER points) and domain adaptation (+0.35 and +0.45 TER points). We release the synthetic data, code, and models accrued during this study publicly for further research.

Anthology ID:: 2024.findings-emnlp.634
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10800–10812
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.634
DOI:
Bibkey:
Cite (ACL):: Sourabh Deoghare, Diptesh Kanojia, and Pushpak Bhattacharyya. 2024. Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10800–10812, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages (Deoghare et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.634.pdf

PDF Cite Search