DADA: Distribution-Aware Domain Adaptation of PLMs for Information Retrieval

Dohyeon Lee, Jongyoon Kim, Seung-won Hwang, Joonsuk Park


Abstract
Pre-trained language models (PLMs) exhibit promise in retrieval tasks but struggle with out-of-domain data due to distribution shifts.Addressing this, generative domain adaptation (DA), known as GPL, tackles distribution shifts by generating pseudo queries and labels to train models for predicting query-document relationships in new domains.However, it overlooks the domain distribution, causing the model to struggle with aligning the distribution in the target domain.We, therefore, propose a Distribution-Aware Domain Adaptation (DADA) to guide the model to consider the domain distribution knowledge at the level of both a single document and the corpus, which is referred to as observation-level feedback and domain-level feedback, respectively.Our method effectively adapts the model to the target domain and expands document representation to unseen gold query terms using domain and observation feedback, as demonstrated by empirical results on the BEIR benchmark.
Anthology ID:
2024.findings-acl.825
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13882–13893
Language:
URL:
https://aclanthology.org/2024.findings-acl.825
DOI:
Bibkey:
Cite (ACL):
Dohyeon Lee, Jongyoon Kim, Seung-won Hwang, and Joonsuk Park. 2024. DADA: Distribution-Aware Domain Adaptation of PLMs for Information Retrieval. In Findings of the Association for Computational Linguistics ACL 2024, pages 13882–13893, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
DADA: Distribution-Aware Domain Adaptation of PLMs for Information Retrieval (Lee et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.825.pdf