Fast document translation for cross-language information retrieval

J.Scott McCarley, Salim Roukos


Abstract
We describe a statistical algorithm for machine translation intended to provide translations of large document collections at speeds far in excess of traditional machine translation systems, and of sufficiently high quality to perform information retrieval on the translated document collections. The model is trained from a parallel corpus and is capable of disambiguating senses of words. Information retrieval (IR) experiments on a French language dataset from a recent cross-language information retrieval evaluation yields results superior to those obtained by participants in the evaluation, and confirm the importance of word sense disambiugation in cross-language information retrieval.
Anthology ID:
1998.amta-papers.13
Volume:
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 28-31
Year:
1998
Address:
Langhorne, PA, USA
Editors:
David Farwell, Laurie Gerber, Eduard Hovy
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
150–157
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-49478-2_14
DOI:
Bibkey:
Cite (ACL):
J.Scott McCarley and Salim Roukos. 1998. Fast document translation for cross-language information retrieval. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 150–157, Langhorne, PA, USA. Springer.
Cite (Informal):
Fast document translation for cross-language information retrieval (McCarley & Roukos, AMTA 1998)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-49478-2_14