Conformalizing Machine Translation Evaluation

Chrysoula Zerva, André F. T. Martins


Abstract
Several uncertainty estimation methods have been recently proposed for machine translation evaluation. While these methods can provide a useful indication of when not to trust model predictions, we show in this paper that the majority of them tend to underestimate model uncertainty, and as a result, they often produce misleading confidence intervals that do not cover the ground truth. We propose as an alternative the use of conformal prediction, a distribution-free method to obtain confidence intervals with a theoretically established guarantee on coverage. First, we demonstrate that split conformal prediction can “correct” the confidence intervals of previous methods to yield a desired coverage level, and we demonstrate these findings across multiple machine translation evaluation metrics and uncertainty quantification methods. Further, we highlight biases in estimated confidence intervals, reflected in imbalanced coverage for different attributes, such as the language and the quality of translations. We address this by applying conditional conformal prediction techniques to obtain calibration subsets for each data subgroup, leading to equalized coverage. Overall, we show that, provided access to a calibration set, conformal prediction can help identify the most suitable uncertainty quantification methods and adapt the predicted confidence intervals to ensure fairness with respect to different attributes.1
Anthology ID:
2024.tacl-1.80
Volume:
Transactions of the Association for Computational Linguistics, Volume 12
Month:
Year:
2024
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1460–1478
Language:
URL:
https://aclanthology.org/2024.tacl-1.80/
DOI:
10.1162/tacl_a_00711
Bibkey:
Cite (ACL):
Chrysoula Zerva and André F. T. Martins. 2024. Conformalizing Machine Translation Evaluation. Transactions of the Association for Computational Linguistics, 12:1460–1478.
Cite (Informal):
Conformalizing Machine Translation Evaluation (Zerva & Martins, TACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.tacl-1.80.pdf