Transfer-Aware Data Selection for Domain Adaptation in Text Retrieval

Linzhu Yu; Huan Li; Ke Chen; Lidan Shou

doi:10.18653/v1/2025.findings-emnlp.948

Transfer-Aware Data Selection for Domain Adaptation in Text Retrieval

Abstract

Domain adaptation is widely adopted in text retrieval scenarios where large labeled data is unavailable. To improve model adaptability, existing methods try to expand more source datasets. However, we found from experiments that indiscriminately using a large amount of source data from various text tasks does not guarantee improved adaptability, but may negatively impact model performance. To tackle this issue, we propose Trait, a framework that can effectively improve model adaptability by selecting beneficial data without evaluating all source data. Specifically, we first divide multiple source datasets into data chunks of the same size as the minimum selection unit to form the whole selection space. Then we devise an iterative process that includes Bayesian optimization-based selection and transfer-aware chunk evaluation to incrementally select beneficial chunks. To reduce unnecessary evaluation costs, we also design backtracking and pruning actions to adjust the selection subspace. Extensive experimental results show that Trait not only achieves average state-of-the-art for few-shot on nine target datasets by evaluating only 4% of BERRI source data, but also is very competitive for zero-shot compared with LLM-based rankers.

Anthology ID:: 2025.findings-emnlp.948
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17504–17519
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.948/
DOI:: 10.18653/v1/2025.findings-emnlp.948
Bibkey:
Cite (ACL):: Linzhu Yu, Huan Li, Ke Chen, and Lidan Shou. 2025. Transfer-Aware Data Selection for Domain Adaptation in Text Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17504–17519, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Transfer-Aware Data Selection for Domain Adaptation in Text Retrieval (Yu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.948.pdf
Checklist:: 2025.findings-emnlp.948.checklist.pdf

PDF Cite Search Checklist Fix data