Matching-oriented Embedding Quantization For Ad-hoc Retrieval

Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, Xing Xie


Abstract
Product quantization (PQ) is a widely used technique for ad-hoc retrieval. Recent studies propose supervised PQ, where the embedding and quantization models can be jointly trained with supervised learning. However, there is a lack of appropriate formulation of the joint training objective; thus, the improvements over previous non-supervised baselines are limited in reality. In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated. With the minimization of MCL, we are able to maximize the matching probability of query and ground-truth key, which contributes to the optimal retrieval accuracy. Given that the exact computation of MCL is intractable due to the demand of vast contrastive samples, we further propose the Differentiable Cross-device Sampling (DCS), which significantly augments the contrastive samples for precise approximation of MCL. We conduct extensive experimental studies on four real-world datasets, whose results verify the effectiveness of MoPQ. The code is available at https://github.com/microsoft/MoPQ.
Anthology ID:
2021.emnlp-main.640
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8119–8129
Language:
URL:
https://aclanthology.org/2021.emnlp-main.640
DOI:
10.18653/v1/2021.emnlp-main.640
Bibkey:
Cite (ACL):
Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, and Xing Xie. 2021. Matching-oriented Embedding Quantization For Ad-hoc Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8119–8129, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Matching-oriented Embedding Quantization For Ad-hoc Retrieval (Xiao et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.640.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.640.mp4
Code
 microsoft/mopq