Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain

Shadi Saleh, Pavel Pecina


Abstract
We present a thorough comparison of two principal approaches to Cross-Lingual Information Retrieval: document translation (DT) and query translation (QT). Our experiments are conducted using the cross-lingual test collection produced within the CLEF eHealth information retrieval tasks in 2013–2015 containing English documents and queries in several European languages. We exploit the Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) paradigms and train several domain-specific and task-specific machine translation systems to translate the non-English queries into English (for the QT approach) and the English documents to all the query languages (for the DT approach). The results show that the quality of QT by SMT is sufficient enough to outperform the retrieval results of the DT approach for all the languages. NMT then further boosts translation quality and retrieval quality for both QT and DT for most languages, but still, QT provides generally better retrieval results than DT.
Anthology ID:
2020.acl-main.613
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6849–6860
Language:
URL:
https://aclanthology.org/2020.acl-main.613
DOI:
10.18653/v1/2020.acl-main.613
Bibkey:
Cite (ACL):
Shadi Saleh and Pavel Pecina. 2020. Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6849–6860, Online. Association for Computational Linguistics.
Cite (Informal):
Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain (Saleh & Pecina, ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.613.pdf
Video:
 http://slideslive.com/38929450