Michelle Chen Huebscher
2022
Decoding a Neural Retriever’s Latent Space for Query Suggestion
Leonard Adolphs
|
Michelle Chen Huebscher
|
Christian Buck
|
Sertan Girgin
|
Olivier Bachem
|
Massimiliano Ciaramita
|
Thomas Hofmann
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a “query decoder” that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand “what should have been asked” to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines.
Search
Fix data
Co-authors
- Leonard Adolphs 1
- Olivier Bachem 1
- Christian Buck 1
- Massimiliano Ciaramita 1
- Sertan Girgin 1
- show all...