Quality Scoring of Source Words in Neural Translation Models

Priyesh Jain, Sunita Sarawagi, Tushar Tomar


Abstract
Word-level quality scores on input source sentences can provide useful feedback to an end-user when translating into an unfamiliar target language. Recent approaches either require training special word-scoring models based on synthetic data or require repeated invocation of the translation model. We propose a simple approach based on comparing the difference of probabilities from two language models. The basic premise of our method is to reason how well each source word is explained by the target sentence as against the source language model. Our approach provides up to five points higher F1 scores and is significantly faster than the state of the art methods on three language pairs. Also, our method does not require training any new model. We release a public dataset on word omissions and mistranslations on a new language pair.
Anthology ID:
2022.emnlp-main.732
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10683–10691
Language:
URL:
https://aclanthology.org/2022.emnlp-main.732
DOI:
10.18653/v1/2022.emnlp-main.732
Bibkey:
Cite (ACL):
Priyesh Jain, Sunita Sarawagi, and Tushar Tomar. 2022. Quality Scoring of Source Words in Neural Translation Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10683–10691, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Quality Scoring of Source Words in Neural Translation Models (Jain et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.732.pdf