2021
pdf
bib
abs
Learning to Rank in the Age of Muppets: Effectiveness–Efficiency Tradeoffs in Multi-Stage Ranking
Yue Zhang
|
ChengCheng Hu
|
Yuqi Liu
|
Hui Fang
|
Jimmy Lin
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing
It is well known that rerankers built on pretrained transformer models such as BERT have dramatically improved retrieval effectiveness in many tasks. However, these gains have come at substantial costs in terms of efficiency, as noted by many researchers. In this work, we show that it is possible to retain the benefits of transformer-based rerankers in a multi-stage reranking pipeline by first using feature-based learning-to-rank techniques to reduce the number of candidate documents under consideration without adversely affecting their quality in terms of recall. Applied to the MS MARCO passage and document ranking tasks, we are able to achieve the same level of effectiveness, but with up to 18× increase in efficiency. Furthermore, our techniques are orthogonal to other methods focused on accelerating transformer inference, and thus can be combined for even greater efficiency gains. A higher-level message from our work is that, even though pretrained transformers dominate the modern IR landscape, there are still important roles for “traditional” LTR techniques, and that we should not forget history.
2020
pdf
bib
abs
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
Edwin Zhang
|
Nikhil Gupta
|
Raphael Tang
|
Xiao Han
|
Ronak Pradeep
|
Kuang Lu
|
Yue Zhang
|
Rodrigo Nogueira
|
Kyunghyun Cho
|
Hui Fang
|
Jimmy Lin
Proceedings of the First Workshop on Scholarly Document Processing
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the multi-round TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the best systems. In round 3, we submitted the highest-scoring run that took advantage of previous training data and the second-highest fully automatic run. In rounds 4 and 5, we submitted the highest-scoring fully automatic runs.
2014
pdf
bib
A Study of Concept-based Weighting Regularization for Medical Records Search
Yue Wang
|
Xitong Liu
|
Hui Fang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
Identifying Important Features for Graph Retrieval
Zhuo Li
|
Sandra Carberry
|
Hui Fang
|
Kathleen McCoy
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2008
pdf
bib
A Re-examination of Query Expansion Using Lexical Resources
Hui Fang
Proceedings of ACL-08: HLT