Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya


Abstract
This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by 2.5 and 2.39 TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning (+1.29 and +1.44 TER points), data augmentation (+0.53 and +0.45 TER points) and domain adaptation (+0.35 and +0.45 TER points). We release the synthetic data, code, and models accrued during this study publicly for further research.
Anthology ID:
2024.findings-emnlp.634
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10800–10812
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.634
DOI:
Bibkey:
Cite (ACL):
Sourabh Deoghare, Diptesh Kanojia, and Pushpak Bhattacharyya. 2024. Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10800–10812, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages (Deoghare et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.634.pdf