CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation

Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, Yun-Nung Chen


Abstract
Conversational search provides a natural interface for information retrieval (IR). Recent approaches have demonstrated promising results in applying dense retrieval to conversational IR. However, training dense retrievers requires large amounts of in-domain paired data. This hinders the development of conversational dense retrievers, as abundant in-domain conversations are expensive to collect. In this paper, we propose Converser, a framework for training conversational dense retrievers with at most 6 examples of in-domain dialogues. Specifically, we utilize the in-context learning capability of large language models to generate conversational queries given a passage in the retrieval corpus. Experimental results on conversational retrieval benchmarks OR-QuAC and TREC CAsT 19 show that the proposed Converser achieves comparable performance to fully-supervised models, demonstrating the effectiveness of our proposed framework in few-shot conversational dense retrieval. All source code and generated datasets are available: https://github.com/MiuLab/CONVERSER
Anthology ID:
2023.sigdial-1.34
Volume:
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
September
Year:
2023
Address:
Prague, Czechia
Editors:
Svetlana Stoyanchev, Shafiq Joty, David Schlangen, Ondrej Dusek, Casey Kennington, Malihe Alikhani
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
381–387
Language:
URL:
https://aclanthology.org/2023.sigdial-1.34
DOI:
10.18653/v1/2023.sigdial-1.34
Bibkey:
Cite (ACL):
Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, and Yun-Nung Chen. 2023. CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 381–387, Prague, Czechia. Association for Computational Linguistics.
Cite (Informal):
CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation (Huang et al., SIGDIAL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.sigdial-1.34.pdf