Evaluation of Question Answer Generation for Portuguese: Insights and Datasets

Felipe Paula, Cassiana Michelin, Viviane Moreira


Abstract
Automatic question generation is an increasingly important task that can be applied in different settings, including educational purposes, data augmentation for question-answering (QA), and conversational systems. More specifically, we focus on question answer generation (QAG), which produces question-answer pairs given an input context. We adapt and apply QAG approaches to generate question-answer pairs for different domains and assess their capacity to generate accurate, diverse, and abundant question-answer pairs. Our analyses combine both qualitative and quantitative evaluations that allow insights into the quality and types of errors made by QAG methods. We also look into strategies for error filtering and their effects. Our work concentrates on Portuguese, a widely spoken language that is underrepresented in natural language processing research. To address the pressing need for resources, we generate and make available human-curated extractive QA datasets in three diverse domains.
Anthology ID:
2024.findings-emnlp.306
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5315–5327
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.306
DOI:
Bibkey:
Cite (ACL):
Felipe Paula, Cassiana Michelin, and Viviane Moreira. 2024. Evaluation of Question Answer Generation for Portuguese: Insights and Datasets. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 5315–5327, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluation of Question Answer Generation for Portuguese: Insights and Datasets (Paula et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.306.pdf