%0 Conference Proceedings
%T ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
%A Hui, Kai
%A Zhuang, Honglei
%A Chen, Tao
%A Qin, Zhen
%A Lu, Jing
%A Bahri, Dara
%A Ma, Ji
%A Gupta, Jai
%A Nogueira dos Santos, Cicero
%A Tay, Yi
%A Metzler, Donald
%Y Muresan, Smaranda
%Y Nakov, Preslav
%Y Villavicencio, Aline
%S Findings of the Association for Computational Linguistics: ACL 2022
%D 2022
%8 May
%I Association for Computational Linguistics
%C Dublin, Ireland
%F hui-etal-2022-ed2lm
%X State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper proposes a new training and inference paradigm for re-ranking. We propose to finetune a pretrained encoder-decoder model using in the form of document to query generation. Subsequently, we show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference. This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference. Our experiments show that this new paradigm achieves results that are comparable to the more expensive cross-attention ranking approaches while being up to 6.8X faster. We believe this work paves the way for more efficient neural rankers that leverage large pretrained models.
%R 10.18653/v1/2022.findings-acl.295
%U https://aclanthology.org/2022.findings-acl.295
%U https://doi.org/10.18653/v1/2022.findings-acl.295
%P 3747-3758