PairDistill: Pairwise Relevance Distillation for Dense Retrieval

Chao-Wei Huang; Yun-Nung Chen

doi:10.18653/v1/2024.emnlp-main.1013

PairDistill: Pairwise Relevance Distillation for Dense Retrieval

Abstract

Effective information retrieval (IR) from vast datasets relies on advanced techniques to extract relevant information in response to queries. Recent advancements in dense retrieval have showcased remarkable efficacy compared to traditional sparse retrieval methods. To further enhance retrieval performance, knowledge distillation techniques, often leveraging robust cross-encoder rerankers, have been extensively explored. However, existing approaches primarily distill knowledge from pointwise rerankers, which assign absolute relevance scores to documents, thus facing challenges related to inconsistent comparisons. This paper introduces Pairwise Relevance Distillation (PairDistill) to leverage pairwise reranking, offering fine-grained distinctions between similarly relevant documents to enrich the training of dense retrieval models. Our experiments demonstrate that PairDistill outperforms existing methods, achieving new state-of-the-art results across multiple benchmarks. This highlights the potential of PairDistill in advancing dense retrieval techniques effectively. Our source code and trained models are released at https://github.com/MiuLab/PairDistill

Anthology ID:: 2024.emnlp-main.1013
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18225–18237
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1013/
DOI:: 10.18653/v1/2024.emnlp-main.1013
Bibkey:
Cite (ACL):: Chao-Wei Huang and Yun-Nung Chen. 2024. PairDistill: Pairwise Relevance Distillation for Dense Retrieval. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18225–18237, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: PairDistill: Pairwise Relevance Distillation for Dense Retrieval (Huang & Chen, EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.1013.pdf

PDF Cite Search Fix data