Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation

Dongha Choi, HongSeok Choi, Hyunju Lee


Abstract
Since the development and wide use of pretrained language models (PLMs), several approaches have been applied to boost their performance on downstream tasks in specific domains, such as biomedical or scientific domains. Additional pre-training with in-domain texts is the most common approach for providing domain-specific knowledge to PLMs. However, these pre-training methods require considerable in-domain data and training resources and a longer training time. Moreover, the training must be re-performed whenever a new PLM emerges. In this study, we propose a domain knowledge transferring (DoKTra) framework for PLMs without additional in-domain pretraining. Specifically, we extract the domain knowledge from an existing in-domain pretrained language model and transfer it to other PLMs by applying knowledge distillation. In particular, we employ activation boundary distillation, which focuses on the activation of hidden neurons. We also apply an entropy regularization term in both teacher training and distillation to encourage the model to generate reliable output probabilities, and thus aid the distillation. By applying the proposed DoKTra framework to downstream tasks in the biomedical, clinical, and financial domains, our student models can retain a high percentage of teacher performance and even outperform the teachers in certain tasks. Our code is available at https://github.com/DMCB-GIST/DoKTra.
Anthology ID:
2022.acl-long.116
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1658–1669
Language:
URL:
https://aclanthology.org/2022.acl-long.116
DOI:
10.18653/v1/2022.acl-long.116
Bibkey:
Cite (ACL):
Dongha Choi, HongSeok Choi, and Hyunju Lee. 2022. Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1658–1669, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation (Choi et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.116.pdf
Software:
 2022.acl-long.116.software.zip
Code
 dmcb-gist/doktra
Data
BLUEHOC