Complementing dictionary-based query translations with corpus statistics for cross-language IR

Sung Hyon Myaeng, Mung-Gil Jang


Abstract
For cross-language information retrieval (CLIR), often queries or documents are translated into the other language to create a mono-lingual information retrieval situation. Having surveyed recent research results on translation-based CLIR, we have convinced ourselves that an effective query translation method is an essential element for a practical CLIR system with a reasonable quality. After summarizing the arguments and methods for query translation and survey results for dictionary-based translation methods, this paper describes a relatively simple yet effective method of using mutual information to handle the ambiguity problem known to be the major factor for low performance compared to mono-lingual situation. Our experimental results based on the TREC-6 collection shows that this method can achieve up to 85% of the monolingual retrieval case and 96% of the manual disambiguation case.
Anthology ID:
1999.mtsummit-1.25
Volume:
Proceedings of Machine Translation Summit VII
Month:
September 13-17
Year:
1999
Address:
Singapore, Singapore
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
165–174
Language:
URL:
https://aclanthology.org/1999.mtsummit-1.25
DOI:
Bibkey:
Cite (ACL):
Sung Hyon Myaeng and Mung-Gil Jang. 1999. Complementing dictionary-based query translations with corpus statistics for cross-language IR. In Proceedings of Machine Translation Summit VII, pages 165–174, Singapore, Singapore.
Cite (Informal):
Complementing dictionary-based query translations with corpus statistics for cross-language IR (Myaeng & Jang, MTSummit 1999)
Copy Citation:
PDF:
https://aclanthology.org/1999.mtsummit-1.25.pdf