Source-Free Domain Adaptation for Question Answering with Masked Self-training

Maxwell J. Yin, Boyu Wang, Yue Dong, Charles Ling


Abstract
Previous unsupervised domain adaptation (UDA) methods for question answering (QA) require access to source domain data while fine-tuning the model for the target domain. Source domain data may, however, contain sensitive information and should be protected. In this study, we investigate a more challenging setting, source-free UDA, in which we have only the pretrained source model and target domain data, without access to source domain data. We propose a novel self-training approach to QA models that integrates a specially designed mask module for domain adaptation. The mask is auto-adjusted to extract key domain knowledge when trained on the source domain. To maintain previously learned domain knowledge, certain mask weights are frozen during adaptation, while other weights are adjusted to mitigate domain shifts with pseudo-labeled samples generated in the target domain. Our empirical results on four benchmark datasets suggest that our approach significantly enhances the performance of pretrained QA models on the target domain, and even outperforms models that have access to the source data during adaptation.
Anthology ID:
2024.tacl-1.40
Volume:
Transactions of the Association for Computational Linguistics, Volume 12
Month:
Year:
2024
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
721–737
Language:
URL:
https://aclanthology.org/2024.tacl-1.40
DOI:
10.1162/tacl_a_00669
Bibkey:
Cite (ACL):
Maxwell J. Yin, Boyu Wang, Yue Dong, and Charles Ling. 2024. Source-Free Domain Adaptation for Question Answering with Masked Self-training. Transactions of the Association for Computational Linguistics, 12:721–737.
Cite (Informal):
Source-Free Domain Adaptation for Question Answering with Masked Self-training (Yin et al., TACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.tacl-1.40.pdf