Pseudo-Relevance for Enhancing Document Representation

Jihyuk Kim, Seung-won Hwang, Seoho Song, Hyeseon Ko, Young-In Song


Abstract
This paper studies how to enhance the document representation for the bi-encoder approach in dense document retrieval. The bi-encoder, separately encoding a query and a document as a single vector, is favored for high efficiency in large-scale information retrieval, compared to more effective but complex architectures. To combine the strength of the two, the multi-vector representation of documents for bi-encoder, such as ColBERT preserving all token embeddings, has been widely adopted. Our contribution is to reduce the size of the multi-vector representation, without compromising the effectiveness, supervised by query logs. Our proposed solution decreases the latency and the memory footprint, up to 8- and 3-fold, validated on MSMARCO and real-world search query logs.
Anthology ID:
2022.emnlp-main.800
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11639–11652
Language:
URL:
https://aclanthology.org/2022.emnlp-main.800
DOI:
10.18653/v1/2022.emnlp-main.800
Bibkey:
Cite (ACL):
Jihyuk Kim, Seung-won Hwang, Seoho Song, Hyeseon Ko, and Young-In Song. 2022. Pseudo-Relevance for Enhancing Document Representation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11639–11652, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Pseudo-Relevance for Enhancing Document Representation (Kim et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.800.pdf
Software:
 2022.emnlp-main.800.software.zip