Flexible retrieval with NMSLIB and FlexNeuART

Leonid Boytsov, Eric Nyberg


Abstract
Our objective is to introduce to the NLP community NMSLIB, describe a new retrieval toolkit FlexNeuART, as well as their integration capabilities. NMSLIB, while being one the fastest k-NN search libraries, is quite generic and supports a variety of distance/similarity functions. Because the library relies on the distance-based structure-agnostic algorithms, it can be further extended by adding new distances. FlexNeuART is a modular, extendible and flexible toolkit for candidate generation in IR and QA applications, which supports mixing of classic and neural ranking signals. FlexNeuART can efficiently retrieve mixed dense and sparse representations (with weights learned from training data), which is achieved by extending NMSLIB. In that, other retrieval systems work with purely sparse representations (e.g., Lucene), purely dense representations (e.g., FAISS and Annoy), or only perform mixing at the re-ranking stage.
Anthology ID:
2020.nlposs-1.6
Volume:
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)
Month:
November
Year:
2020
Address:
Online
Editors:
Eunjeong L. Park, Masato Hagiwara, Dmitrijs Milajevs, Nelson F. Liu, Geeticka Chauhan, Liling Tan
Venue:
NLPOSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–43
Language:
URL:
https://aclanthology.org/2020.nlposs-1.6
DOI:
10.18653/v1/2020.nlposs-1.6
Bibkey:
Cite (ACL):
Leonid Boytsov and Eric Nyberg. 2020. Flexible retrieval with NMSLIB and FlexNeuART. In Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), pages 32–43, Online. Association for Computational Linguistics.
Cite (Informal):
Flexible retrieval with NMSLIB and FlexNeuART (Boytsov & Nyberg, NLPOSS 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.nlposs-1.6.pdf
Video:
 https://slideslive.com/38939743
Code
 oaqa/FlexNeuART +  additional community code
Data
MS MARCO