Learning to Weight Translations using Ordinal Linear Regression and Query-generated Training Data for Ad-hoc Retrieval with Long Queries

Javid Dadashkarimi, Masoud Jalili Sabet, Azadeh Shakery


Abstract
Ordinal regression which is known with learning to rank has long been used in information retrieval (IR). Learning to rank algorithms, have been tailored in document ranking, information filtering, and building large aligned corpora successfully. In this paper, we propose to use this algorithm for query modeling in cross-language environments. To this end, first we build a query-generated training data using pseudo-relevant documents to the query and all translation candidates. The pseudo-relevant documents are obtained by top-ranked documents in response to a translation of the original query. The class of each candidate in the training data is determined based on presence/absence of the candidate in the pseudo-relevant documents. We learn an ordinal regression model to score the candidates based on their relevance to the context of the query, and after that, we construct a query-dependent translation model using a softmax function. Finally, we re-weight the query based on the obtained model. Experimental results on French, German, Spanish, and Italian CLEF collections demonstrate that the proposed method achieves better results compared to state-of-the-art cross-language information retrieval methods, particularly in long queries with large training data.
Anthology ID:
C16-1162
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
1725–1733
Language:
URL:
https://aclanthology.org/C16-1162/
DOI:
Bibkey:
Cite (ACL):
Javid Dadashkarimi, Masoud Jalili Sabet, and Azadeh Shakery. 2016. Learning to Weight Translations using Ordinal Linear Regression and Query-generated Training Data for Ad-hoc Retrieval with Long Queries. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 1725–1733, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Learning to Weight Translations using Ordinal Linear Regression and Query-generated Training Data for Ad-hoc Retrieval with Long Queries (Dadashkarimi et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1162.pdf