Generating Information-Seeking Conversations from Unlabeled Documents

Gangwoo Kim, Sungdong Kim, Kang Min Yoo, Jaewoo Kang


Abstract
Synthesizing datasets for conversational question answering (CQA) from unlabeled documents remains challenging due to its interactive nature. Moreover, while modeling information needs is an essential key, only few studies have discussed it. In this paper, we introduce a novel framework, **SimSeek**, (**Sim**ulating information-**Seek**ing conversation from unlabeled documents), and compare its two variants. In our baseline, **SimSeek-sym**, a questioner generates follow-up questions upon the predetermined answer by an answerer. On the contrary, **SimSeek-asym** first generates the question and then finds its corresponding answer under the conversational context. Our experiments show that they can synthesize effective training resources for CQA and conversational search tasks. As a result, conversations from **SimSeek-asym** not only make more improvements in our experiments but also are favorably reviewed in a human evaluation. We finally release a large-scale resource of synthetic conversations, **Wiki-SimSeek**, containing 2 million CQA pairs built upon Wikipedia documents. With the dataset, our CQA model achieves the state-of-the-art performance on a recent CQA benchmark, QuAC.The code and dataset are available at https://github.com/naver-ai/simseek
Anthology ID:
2022.emnlp-main.151
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2362–2378
Language:
URL:
https://aclanthology.org/2022.emnlp-main.151
DOI:
10.18653/v1/2022.emnlp-main.151
Bibkey:
Cite (ACL):
Gangwoo Kim, Sungdong Kim, Kang Min Yoo, and Jaewoo Kang. 2022. Generating Information-Seeking Conversations from Unlabeled Documents. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2362–2378, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Generating Information-Seeking Conversations from Unlabeled Documents (Kim et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.151.pdf