Evangelos E. Papalexakis
2025
ExpertGenQA: Open-ended QA generation in Specialized Domains
Haz Sameen Shahgir
|
Chansong Lim
|
Jia Chen
|
Evangelos E. Papalexakis
|
Yue Dong
Findings of the Association for Computational Linguistics: EMNLP 2025
Generating high-quality question–answer (QA) pairs for specialized technical domains is essential for advancing knowledge comprehension, yet remains challenging. Existing methods often yield generic or shallow questions that fail to reflect the depth and structure of expert-written examples. We propose ExpertGenQA, a generation protocol that combines few-shot prompting with dual categorization by topic and question style to produce more diverse and cognitively meaningful QA pairs. ExpertGenQA achieves twice the efficiency of standard few-shot methods while maintaining 94.4% topic coverage. Unlike LLM-based judges, which often favor surface fluency, Bloom’s Taxonomy analysis shows that ExpertGenQA better captures expert-level cognitive complexity. When used to train retrieval systems, our questions improve top-1 accuracy by 13.02%, demonstrating their practical value for domain-specific applications.