Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

Sweta Agrawal; Amin Farajian; Patrick Fernandes; Ricardo Rei; André F. T. Martins

doi:10.1162/tacl_a_00700

Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

Sweta Agrawal, Amin Farajian, Patrick Fernandes, Ricardo Rei, André F. T. Martins

Abstract

Despite the recent success of automatic metrics for assessing translation quality, their application in evaluating the quality of machine-translated chats has been limited. Unlike more structured texts like news, chat conversations are often unstructured, short, and heavily reliant on contextual information. This poses questions about the reliability of existing sentence-level metrics in this domain as well as the role of context in assessing the translation quality. Motivated by this, we conduct a meta-evaluation of existing automatic metrics, primarily designed for structured domains such as news, to assess the quality of machine-translated chats. We find that reference-free metrics lag behind reference-based ones, especially when evaluating translation quality in out-of-English settings. We then investigate how incorporating conversational contextual information in these metrics for sentence-level evaluation affects their performance. Our findings show that augmenting neural learned metrics with contextual information helps improve correlation with human judgments in the reference-free scenario and when evaluating translations in out-of-English settings. Finally, we propose a new evaluation metric, Context-MQM, that utilizes bilingual context with a large language model (LLM) and further validate that adding context helps even for LLM-based evaluation metrics.

Anthology ID:: 2024.tacl-1.69
Volume:: Transactions of the Association for Computational Linguistics, Volume 12
Month:
Year:: 2024
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 1250–1267
Language:
URL:: https://aclanthology.org/2024.tacl-1.69/
DOI:: 10.1162/tacl_a_00700
Bibkey:
Cite (ACL):: Sweta Agrawal, Amin Farajian, Patrick Fernandes, Ricardo Rei, and André F. T. Martins. 2024. Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?. Transactions of the Association for Computational Linguistics, 12:1250–1267.
Cite (Informal):: Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions? (Agrawal et al., TACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.tacl-1.69.pdf

PDF Cite Search Fix data