ExpertGenQA: Open-ended QA generation in Specialized Domains

Haz Sameen Shahgir, Chansong Lim, Jia Chen, Evangelos E. Papalexakis, Yue Dong


Abstract
Generating high-quality question–answer (QA) pairs for specialized technical domains is essential for advancing knowledge comprehension, yet remains challenging. Existing methods often yield generic or shallow questions that fail to reflect the depth and structure of expert-written examples. We propose ExpertGenQA, a generation protocol that combines few-shot prompting with dual categorization by topic and question style to produce more diverse and cognitively meaningful QA pairs. ExpertGenQA achieves twice the efficiency of standard few-shot methods while maintaining 94.4% topic coverage. Unlike LLM-based judges, which often favor surface fluency, Bloom’s Taxonomy analysis shows that ExpertGenQA better captures expert-level cognitive complexity. When used to train retrieval systems, our questions improve top-1 accuracy by 13.02%, demonstrating their practical value for domain-specific applications.
Anthology ID:
2025.findings-emnlp.159
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2934–2955
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.159/
DOI:
Bibkey:
Cite (ACL):
Haz Sameen Shahgir, Chansong Lim, Jia Chen, Evangelos E. Papalexakis, and Yue Dong. 2025. ExpertGenQA: Open-ended QA generation in Specialized Domains. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2934–2955, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
ExpertGenQA: Open-ended QA generation in Specialized Domains (Shahgir et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.159.pdf
Checklist:
 2025.findings-emnlp.159.checklist.pdf