Query Generation with External Knowledge for Dense Retrieval

Sukmin Cho, Soyeong Jeong, Wonsuk Yang, Jong Park


Abstract
Dense retrieval aims at searching for the most relevant documents to the given query by encoding texts in the embedding space, requiring a large amount of query-document pairs to train. Since manually constructing such training data is challenging, recent work has proposed to generate synthetic queries from documents and use them to train a dense retriever. However, compared to the manually composed queries, synthetic queries do not generally ask for implicit information, therefore leading to a degraded retrieval performance. In this work, we propose Query Generation with External Knowledge (QGEK), a novel method for generating queries with external information related to the corresponding document. Specifically, we convert a query into a triplet-based template form to accommodate external information and transmit it to a pre-trained language model (PLM). We validate QGEK on both in-domain and out-domain dense retrieval settings. The dense retriever with the queries requiring implicit information is found to make good performance improvement. Also, such queries are similar to manually composed queries, confirmed by both human evaluation and unique & non-unique words distribution.
Anthology ID:
2022.deelio-1.3
Volume:
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Month:
May
Year:
2022
Address:
Dublin, Ireland and Online
Editors:
Eneko Agirre, Marianna Apidianaki, Ivan Vulić
Venue:
DeeLIO
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–32
Language:
URL:
https://aclanthology.org/2022.deelio-1.3
DOI:
10.18653/v1/2022.deelio-1.3
Bibkey:
Cite (ACL):
Sukmin Cho, Soyeong Jeong, Wonsuk Yang, and Jong Park. 2022. Query Generation with External Knowledge for Dense Retrieval. In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 22–32, Dublin, Ireland and Online. Association for Computational Linguistics.
Cite (Informal):
Query Generation with External Knowledge for Dense Retrieval (Cho et al., DeeLIO 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.deelio-1.3.pdf
Video:
 https://aclanthology.org/2022.deelio-1.3.mp4
Data
ConceptNetFEVERHotpotQAMS MARCOSciDocsSciFactSimpleQuestionsTREC-COVID