Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation

Ananya Ganesh, Marine Carpuat, William Chen, Katharina Kann, Constantine Lignos, John E. Ortega, Jonne Saleva, Shabnam Tafreshi, Rodolfo Zevallos


Abstract
This paper provides an overview of the first shared task on choosing beneficial instances for machine translation, conducted as part of the CoCo4MT 2023 Workshop at MTSummit. This shared task was motivated by the need to make the data annotation process for machine translation more efficient, particularly for low-resource languages for which collecting human translations may be difficult or expensive. The task involved developing methods for selecting the most beneficial instances for training a machine translation system without access to an existing parallel dataset in the target language, such that the best selected instances can then be manually translated. Two teams participated in the shared task, namely the Williams team and the AST team. Submissions were evaluated by training a machine translation model on each submission’s chosen instances, and comparing their performance with the chRF++ score. The system that ranked first is by the Williams team, that finds representative instances by clustering the training data.
Anthology ID:
2023.mtsummit-coco4mt.3
Volume:
Proceedings of the Second Workshop on Corpus Generation and Corpus Augmentation for Machine Translation
Month:
September
Year:
2023
Address:
Macau SAR, China
Venue:
MTSummit
SIG:
Publisher:
Asia-Pacific Association for Machine Translation
Note:
Pages:
22–27
Language:
URL:
https://aclanthology.org/2023.mtsummit-coco4mt.3
DOI:
Bibkey:
Cite (ACL):
Ananya Ganesh, Marine Carpuat, William Chen, Katharina Kann, Constantine Lignos, John E. Ortega, Jonne Saleva, Shabnam Tafreshi, and Rodolfo Zevallos. 2023. Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation. In Proceedings of the Second Workshop on Corpus Generation and Corpus Augmentation for Machine Translation, pages 22–27, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):
Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation (Ganesh et al., MTSummit 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.mtsummit-coco4mt.3.pdf