Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation

Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim


Abstract
Educational question-answer generation has been extensively researched owing to its practical applicability. However, we have identified a persistent challenge concerning the evaluation of such systems. Existing evaluation methods often fail to produce objective results and instead exhibit a bias towards favoring high similarity to the ground-truth question-answer pairs. In this study, we demonstrate that these evaluation methods yield low human alignment and propose an alternative approach called Generative Interpretation (GI) to achieve more objective evaluations. Through experimental analysis, we reveal that GI outperforms existing evaluation methods in terms of human alignment, and even shows comparable performance with GPT3.5, only with BART-large.
Anthology ID:
2024.findings-eacl.145
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2185–2196
Language:
URL:
https://aclanthology.org/2024.findings-eacl.145
DOI:
Bibkey:
Cite (ACL):
Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, and Heuiseok Lim. 2024. Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation. In Findings of the Association for Computational Linguistics: EACL 2024, pages 2185–2196, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation (Moon et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.145.pdf
Software:
 2024.findings-eacl.145.software.zip