Lost & Found in Translation: Impact of Machine Translated Results on Translingual Information Retrieval

Kristen Parton, Nizar Habash, Kathleen McKeown


Abstract
In an ideal cross-lingual information retrieval (CLIR) system, a user query would generate a search over documents in a different language and the relevant results would be presented in the user’s language. In practice, CLIR systems are typically evaluated by judging result relevance in the document language, to factor out the effects of translating the results using machine translation (MT). In this paper, we investigate the influence of four different approaches for integrating MT and CLIR on both retrieval accuracy and user judgment of relevancy. We create a corpus with relevance judgments for both human and machine translated results, and use it to quantify the effect that MT quality has on end-to-end relevance. We find that MT errors result in a 16-39% decrease in mean average precision over the ground truth system that uses human translations. MT errors also caused relevant sentences to appear irrelevant – 5-19% of sentences were relevant in human translation, but were judged irrelevant in MT. To counter this degradation, we present two hybrid retrieval models and two automatic MT post-editing techniques and show that these approaches substantially mitigate the errors and improve the end-to-end relevance.
Anthology ID:
2012.amta-papers.12
Volume:
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 28-November 1
Year:
2012
Address:
San Diego, California, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2012.amta-papers.12
DOI:
Bibkey:
Cite (ACL):
Kristen Parton, Nizar Habash, and Kathleen McKeown. 2012. Lost & Found in Translation: Impact of Machine Translated Results on Translingual Information Retrieval. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Lost & Found in Translation: Impact of Machine Translated Results on Translingual Information Retrieval (Parton et al., AMTA 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.amta-papers.12.pdf