QUADRo: Dataset and Models for QUestion-Answer Database Retrieval

Stefano Campese, Ivano Lauriola, Alessandro Moschitti


Abstract
An effective approach to design automated Question Answering (QA) systems is to efficiently retrieve answers from pre-computed databases containing question/answer pairs. One of the main challenges to this design is the lack of training/testing data. Existing resources are limited in size and topics and either do not consider answers (question-question similarity only) or their quality in the annotation process. To fill this gap, we introduce a novel open-domain annotated resource to train and evaluate models for this task. The resource consists of 15,211 input questions. Each question is paired with 30 similar question/answer pairs, resulting in a total of 443,000 annotated examples. The binary label associated with each pair indicates the relevance with respect to the input question. Furthermore, we report extensive experimentation to test the quality and properties of our resource with respect to various key aspects of QA systems, including answer relevance, training strategies, and models input configuration.
Anthology ID:
2023.findings-emnlp.1042
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15573–15587
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.1042
DOI:
10.18653/v1/2023.findings-emnlp.1042
Bibkey:
Cite (ACL):
Stefano Campese, Ivano Lauriola, and Alessandro Moschitti. 2023. QUADRo: Dataset and Models for QUestion-Answer Database Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15573–15587, Singapore. Association for Computational Linguistics.
Cite (Informal):
QUADRo: Dataset and Models for QUestion-Answer Database Retrieval (Campese et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.1042.pdf