Complex question generation using discourse-based data augmentation

Khushnur Jahangir, Philippe Muller, Chloé Braud


Abstract
Question Generation (QG), the process of generating meaningful questions from a given context, has proven to be useful for several tasks such as question answering or FAQ generation. While most existing QG techniques generate simple, fact-based questions, this research aims to generate questions that can have complex answers (e.g. “why” questions). We propose a data augmentation method that uses discourse relations to create such questions, and experiment on existing English data. Our approach generates questions based solely on the context without answer supervision, in order to enhance question diversity and complexity. We use an encoder-decoder trained on the augmented dataset to generate either one question or multiple questions at a time, and show that the latter improves over the baseline model when doing a human quality evaluation, without degrading performance according to standard automated metrics.
Anthology ID:
2024.codi-1.10
Volume:
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Michael Strube, Chloe Braud, Christian Hardmeier, Junyi Jessy Li, Sharid Loaiciga, Amir Zeldes, Chuyuan Li
Venues:
CODI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–119
Language:
URL:
https://aclanthology.org/2024.codi-1.10
DOI:
Bibkey:
Cite (ACL):
Khushnur Jahangir, Philippe Muller, and Chloé Braud. 2024. Complex question generation using discourse-based data augmentation. In Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024), pages 105–119, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Complex question generation using discourse-based data augmentation (Jahangir et al., CODI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.codi-1.10.pdf
Supplementary material:
 2024.codi-1.10.SupplementaryMaterial.zip