COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

Luyu Gao, Zhuyun Dai, Jamie Callan


Abstract
Classical information retrieval systems such as BM25 rely on exact lexical match and can carry out search efficiently with inverted list index. Recent neural IR models shifts towards soft matching all query document terms, but they lose the computation efficiency of exact match systems. This paper presents COIL, a contextualized exact match retrieval architecture, where scoring is based on overlapping query document tokens’ contextualized representations. The new architecture stores contextualized token representations in inverted lists, bringing together the efficiency of exact match and the representation power of deep language models. Our experimental results show COIL outperforms classical lexical retrievers and state-of-the-art deep LM retrievers with similar or smaller latency.
Anthology ID:
2021.naacl-main.241
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3030–3042
Language:
URL:
https://aclanthology.org/2021.naacl-main.241
DOI:
10.18653/v1/2021.naacl-main.241
Bibkey:
Cite (ACL):
Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3030–3042, Online. Association for Computational Linguistics.
Cite (Informal):
COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List (Gao et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.241.pdf
Video:
 https://aclanthology.org/2021.naacl-main.241.mp4
Code
 luyug/COIL