Building a Dataset for Automatically Learning to Detect Questions Requiring Clarification

Ivano Lauriola, Kevin Small, Alessandro Moschitti


Abstract
Question Answering (QA) systems aim to return correct and concise answers in response to user questions. QA research generally assumes all questions are intelligible and unambiguous, which is unrealistic in practice as questions frequently encountered by virtual assistants are ambiguous or noisy. In this work, we propose to make QA systems more robust via the following two-step process: (1) classify if the input question is intelligible and (2) for such questions with contextual ambiguity, return a clarification question. We describe a new open-domain clarification corpus containing user questions sampled from Quora, which is useful for building machine learning approaches to solving these tasks.
Anthology ID:
2022.lrec-1.502
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4701–4707
Language:
URL:
https://aclanthology.org/2022.lrec-1.502
DOI:
Bibkey:
Cite (ACL):
Ivano Lauriola, Kevin Small, and Alessandro Moschitti. 2022. Building a Dataset for Automatically Learning to Detect Questions Requiring Clarification. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4701–4707, Marseille, France. European Language Resources Association.
Cite (Informal):
Building a Dataset for Automatically Learning to Detect Questions Requiring Clarification (Lauriola et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.502.pdf