Query Translation for Cross-Language Information Retrieval using Multilingual Word Clusters

Paheli Bhattacharya, Pawan Goyal, Sudeshna Sarkar


Abstract
In Cross-Language Information Retrieval, finding the appropriate translation of the source language query has always been a difficult problem to solve. We propose a technique towards solving this problem with the help of multilingual word clusters obtained from multilingual word embeddings. We use word embeddings of the languages projected to a common vector space on which a community-detection algorithm is applied to find clusters such that words that represent the same concept from different languages fall in the same group. We utilize these multilingual word clusters to perform query translation for Cross-Language Information Retrieval for three languages - English, Hindi and Bengali. We have experimented with the FIRE 2012 and Wikipedia datasets and have shown improvements over several standard methods like dictionary-based method, a transliteration-based model and Google Translate.
Anthology ID:
W16-3716
Volume:
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Dekai Wu, Pushpak Bhattacharyya
Venue:
WSSANLP
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
152–162
Language:
URL:
https://aclanthology.org/W16-3716
DOI:
Bibkey:
Cite (ACL):
Paheli Bhattacharya, Pawan Goyal, and Sudeshna Sarkar. 2016. Query Translation for Cross-Language Information Retrieval using Multilingual Word Clusters. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016), pages 152–162, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Query Translation for Cross-Language Information Retrieval using Multilingual Word Clusters (Bhattacharya et al., WSSANLP 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-3716.pdf