Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

Minghan Li, Xinyu Zhang, Ji Xin, Hongyang Zhang, Jimmy Lin


Abstract
In information retrieval (IR), candidate set pruning has been commonly used to speed up two-stage relevance ranking. However, such an approach lacks accurate error control and often trades accuracy against computational efficiency in an empirical fashion, missing theoretical guarantees. In this paper, we propose the concept of certified error control of candidate set pruning for relevance ranking, which means that the test error after pruning is guaranteed to be controlled under a user-specified threshold with high probability. Both in-domain and out-of-domain experiments show that our method successfully prunes the first-stage retrieved candidate sets to improve the second-stage reranking speed while satisfying the pre-specified accuracy constraints in both settings. For example, on MS MARCO Passage v1, our method reduces the average candidate set size from 1000 to 27, increasing reranking speed by about 37 times, while keeping MRR@10 greater than a pre-specified value of 0.38 with about 90% empirical coverage. In contrast, empirical baselines fail to meet such requirements. Code and data are available at: https://github.com/alexlimh/CEC-Ranking.
Anthology ID:
2022.emnlp-main.23
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
333–345
Language:
URL:
https://aclanthology.org/2022.emnlp-main.23
DOI:
10.18653/v1/2022.emnlp-main.23
Bibkey:
Cite (ACL):
Minghan Li, Xinyu Zhang, Ji Xin, Hongyang Zhang, and Jimmy Lin. 2022. Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 333–345, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking (Li et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.23.pdf