Cross-Lingual Document Retrieval with Smooth Learning

Jiapeng Liu, Xiao Zhang, Dan Goldwasser, Xiao Wang


Abstract
Cross-lingual document search is an information retrieval task in which the queries’ language and the documents’ language are different. In this paper, we study the instability of neural document search models and propose a novel end-to-end robust framework that achieves improved performance in cross-lingual search with different documents’ languages. This framework includes a novel measure of the relevance, smooth cosine similarity, between queries and documents, and a novel loss function, Smooth Ordinal Search Loss, as the objective function. We further provide theoretical guarantee on the generalization error bound for the proposed framework. We conduct experiments to compare our approach with other document search models, and observe significant gains under commonly used ranking metrics on the cross-lingual document retrieval task in a variety of languages.
Anthology ID:
2020.coling-main.323
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3616–3629
Language:
URL:
https://aclanthology.org/2020.coling-main.323
DOI:
10.18653/v1/2020.coling-main.323
Bibkey:
Cite (ACL):
Jiapeng Liu, Xiao Zhang, Dan Goldwasser, and Xiao Wang. 2020. Cross-Lingual Document Retrieval with Smooth Learning. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3616–3629, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Cross-Lingual Document Retrieval with Smooth Learning (Liu et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.323.pdf
Code
 JiapengL/multi_ling_search