Yusuke Kimura
2024
L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models
Yusuke Kimura
|
Takahiro Komamizu
|
Kenji Hatano
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
When distributional differences exist between pre-training and fine-tuning data, language models (LMs) may perform poorly on downstream tasks.Recent studies have reported that multi-task learning of downstream task and masked language modeling (MLM) task during the fine-tuning phase improves the performance of the downstream task.Typical MLM tasks (e.g., random token masking (RTM)) tend not to care tokens corresponding to the knowledge already acquired during the pre-training phase, therefore LMs may not notice the important clue or not effective to acquire linguistic knowledge of the task or domain.To overcome this limitation, we propose a new masking strategy for MLM task, called L3Masking, that leverages lessons (specifically, token-wise likelihood in a context) learned from the vanilla language model to be fine-tuned.L3Masking actively masks tokens with low likelihood on the vanilla model.Experimental evaluations on text classification tasks in different domains confirms a multi-task text classification method with L3Masking performed task adaptation more effectively than that with RTM.These results suggest the usefulness of assigning a preference to the tokens to be learned as the task or domain adaptation.
Search