Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

Bolei He; Xinran He; Run Shao; Shanfu Shu; Xianwei Xue; MingQuan Cheng; Haifeng Li; Zhen-Hua Ling

Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

Bolei He, Xinran He, Run Shao, Shanfu Shu, Xianwei Xue, MingQuan Cheng, Haifeng Li, Zhen-Hua Ling

Abstract

Large Language Models (LLMs) perform well in general QA but often struggle in domain-specific scenarios. Retrieval-Augmented Generation (RAG) introduces external knowledge but suffers from hallucinations and latency due to noisy retrievals. Continued pretraining internalizes domain knowledge but is costly and lacks cross-domain flexibility. We attribute this challenge to the long-tail distribution of domain knowledge, which leaves partial yet useful internal knowledge underutilized. We further argue that knowledge acquisition should be progressive, mirroring human learning: first understanding concepts, then applying them to complex reasoning. To address this, we propose Selct2Know (S2K), a cost-effective framework that internalizes domain knowledge through an internal-external knowledge self-selection strategy and selective supervised fine-tuning. We also introduce a structured reasoning data generation pipeline and integrate GRPO to enhance reasoning ability. Experiments on medical, legal, and financial QA benchmarks show that S2K consistently outperforms existing methods and matches domain-pretrained LLMs with significantly lower cost.

Anthology ID:: 2025.findings-emnlp.565
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10683–10703
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.565/
DOI:
Bibkey:
Cite (ACL):: Bolei He, Xinran He, Run Shao, Shanfu Shu, Xianwei Xue, MingQuan Cheng, Haifeng Li, and Zhen-Hua Ling. 2025. Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10683–10703, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering (He et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.565.pdf
Checklist:: 2025.findings-emnlp.565.checklist.pdf

PDF Cite Search Checklist Fix data