Error Identification for Machine Translation with Metric Embedding and Attention

Raphael Rubino, Atsushi Fujita, Benjamin Marie


Abstract
Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations along with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.
Anthology ID:
2021.eval4nlp-1.15
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Yang Gao, Steffen Eger, Wei Zhao, Piyawat Lertvittayakumjorn, Marina Fomicheva
Venue:
Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
146–156
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.15
DOI:
10.18653/v1/2021.eval4nlp-1.15
Bibkey:
Cite (ACL):
Raphael Rubino, Atsushi Fujita, and Benjamin Marie. 2021. Error Identification for Machine Translation with Metric Embedding and Attention. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 146–156, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Error Identification for Machine Translation with Metric Embedding and Attention (Rubino et al., Eval4NLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eval4nlp-1.15.pdf
Video:
 https://aclanthology.org/2021.eval4nlp-1.15.mp4
Data
OPUS