QAEncoder: Towards Aligned Representation Learning in Question Answering Systems

Zhengren Wang; Qinhan Yu; Shida Wei; Zhiyu Li; Feiyu Xiong; Xiaoxing Wang; Simin Niu; Hao Liang; Wentao Zhang

doi:10.18653/v1/2025.acl-long.217

QAEncoder: Towards Aligned Representation Learning in Question Answering Systems

Zhengren Wang, Qinhan Yu, Shida Wei, Zhiyu Li, Feiyu Xiong, Xiaoxing Wang, Simin Niu, Hao Liang, Wentao Zhang

Abstract

Modern QA systems entail retrieval-augmented generation (RAG) for accurate and trustworthy responses. However, the inherent gap between user queries and relevant documents hinders precise matching. We introduce QAEncoder, a training-free approach to bridge this gap. Specifically, QAEncoder estimates the expectation of potential queries in the embedding space as a robust surrogate for the document embedding, and attaches document fingerprints to effectively distinguish these embeddings. Extensive experiments across diverse datasets, languages, and embedding models confirmed QAEncoder’s alignment capability, which offers a simple-yet-effective solution with zero additional index storage, retrieval latency, training costs, or catastrophic forgetting and hallucination issues. The repository is publicly available at https://github.com/IAAR-Shanghai/QAEncoder.

Anthology ID:: 2025.acl-long.217
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4306–4332
Language:
URL:: https://aclanthology.org/2025.acl-long.217/
DOI:: 10.18653/v1/2025.acl-long.217
Bibkey:
Cite (ACL):: Zhengren Wang, Qinhan Yu, Shida Wei, Zhiyu Li, Feiyu Xiong, Xiaoxing Wang, Simin Niu, Hao Liang, and Wentao Zhang. 2025. QAEncoder: Towards Aligned Representation Learning in Question Answering Systems. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4306–4332, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: QAEncoder: Towards Aligned Representation Learning in Question Answering Systems (Wang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.217.pdf

PDF Cite Search Fix data