Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval

Lingjun Zhao, Rabih Zbib, Zhuolin Jiang, Damianos Karakos, Zhongqiang Huang


Abstract
We propose a weakly supervised neural model for Ad-hoc Cross-lingual Information Retrieval (CLIR) from low-resource languages. Low resource languages often lack relevance annotations for CLIR, and when available the training data usually has limited coverage for possible queries. In this paper, we design a model which does not require relevance annotations, instead it is trained on samples extracted from translation corpora as weak supervision. This model relies on an attention mechanism to learn spans in the foreign sentence that are relevant to the query. We report experiments on two low resource languages: Swahili and Tagalog, trained on less that 100k parallel sentences each. The proposed model achieves 19 MAP points improvement compared to using CNNs for feature extraction, 12 points improvement from machine translation-based CLIR, and up to 6 points improvement compared to probabilistic CLIR models.
Anthology ID:
D19-6129
Volume:
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
259–264
Language:
URL:
https://aclanthology.org/D19-6129
DOI:
10.18653/v1/D19-6129
Bibkey:
Cite (ACL):
Lingjun Zhao, Rabih Zbib, Zhuolin Jiang, Damianos Karakos, and Zhongqiang Huang. 2019. Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 259–264, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval (Zhao et al., EMNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-6129.pdf