RAC: Retrieval-augmented Conversation Dataset for Open-domain Question Answering in Conversational Settings

Bonggeun Choi; JeongJae Park; Yoonsung Kim; Jaehyun Park; Youngjoong Ko

doi:10.18653/v1/2024.emnlp-industry.108

RAC: Retrieval-augmented Conversation Dataset for Open-domain Question Answering in Conversational Settings

Bonggeun Choi, JeongJae Park, Yoonsung Kim, Jaehyun Park, Youngjoong Ko

Abstract

In recent years, significant advancements in conversational question and answering (CQA) have been driven by the exponential growth of large language models and the integration of retrieval mechanisms that leverage external knowledge to generate accurate and contextually relevant responses. Consequently, the fields of conversational search and retrieval-augmented generation (RAG) have obtained substantial attention for their capacity to address two key challenges: query rewriting within conversational histories for better retrieval performance and generating responses by employing retrieved knowledge. However, both fields are often independently studied, and comprehensive study on entire systems remains underexplored. In this work, we present a novel retrieval-augmented conversation (RAC) dataset and develop a baseline system comprising query rewriting, retrieval, reranking, and response generation stages. Experimental results demonstrate the competitiveness of the system and extensive analyses are conducted to apprehend the impact of retrieval results to response generation.

Anthology ID:: 2024.emnlp-industry.108
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1477–1488
Language:
URL:: https://aclanthology.org/2024.emnlp-industry.108/
DOI:: 10.18653/v1/2024.emnlp-industry.108
Bibkey:
Cite (ACL):: Bonggeun Choi, JeongJae Park, Yoonsung Kim, Jaehyun Park, and Youngjoong Ko. 2024. RAC: Retrieval-augmented Conversation Dataset for Open-domain Question Answering in Conversational Settings. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1477–1488, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: RAC: Retrieval-augmented Conversation Dataset for Open-domain Question Answering in Conversational Settings (Choi et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-industry.108.pdf
Poster:: 2024.emnlp-industry.108.poster.pdf

PDF Cite Search Poster Fix data