Improving Self-training with Prototypical Learning for Source-Free Domain Adaptation on Clinical Text

Seiji Shimizu, Shuntaro Yada, Lisa Raithel, Eiji Aramaki


Abstract
Domain adaptation is crucial in the clinical domain since the performance of a model trained on one domain (source) degrades seriously when applied to another domain (target). However, conventional domain adaptation methods often cannot be applied due to data sharing restrictions on source data. Source-Free Domain Adaptation (SFDA) addresses this issue by only utilizing a source model and unlabeled target data to adapt to the target domain. In SFDA, self-training is the most widely applied method involving retraining models with target data using predictions from the source model as pseudo-labels. Nevertheless, this approach is prone to contain substantial numbers of errors in pseudo-labeling and might limit model performance in the target domain. In this paper, we propose a Source-Free Prototype-based Self-training (SFPS) aiming to improve the performance of self-training. SFPS generates prototypes without accessing source data and utilizes them for prototypical learning, namely prototype-based pseudo-labeling and contrastive learning. Also, we compare entropy-based, centroid-based, and class-weights-based prototype generation methods to identify the most effective formulation of the proposed method. Experimental results across various datasets demonstrate the effectiveness of the proposed method, consistently outperforming vanilla self-training. The comparison of various prototype-generation methods identifies the most reliable generation method that improves the source model persistently. Additionally, our analysis illustrates SFPS can successfully alleviate errors in pseudo-labeling.
Anthology ID:
2024.bionlp-1.1
Volume:
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–13
Language:
URL:
https://aclanthology.org/2024.bionlp-1.1
DOI:
Bibkey:
Cite (ACL):
Seiji Shimizu, Shuntaro Yada, Lisa Raithel, and Eiji Aramaki. 2024. Improving Self-training with Prototypical Learning for Source-Free Domain Adaptation on Clinical Text. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 1–13, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Improving Self-training with Prototypical Learning for Source-Free Domain Adaptation on Clinical Text (Shimizu et al., BioNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.bionlp-1.1.pdf