L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models

Yusuke Kimura, Takahiro Komamizu, Kenji Hatano


Abstract
When distributional differences exist between pre-training and fine-tuning data, language models (LMs) may perform poorly on downstream tasks.Recent studies have reported that multi-task learning of downstream task and masked language modeling (MLM) task during the fine-tuning phase improves the performance of the downstream task.Typical MLM tasks (e.g., random token masking (RTM)) tend not to care tokens corresponding to the knowledge already acquired during the pre-training phase, therefore LMs may not notice the important clue or not effective to acquire linguistic knowledge of the task or domain.To overcome this limitation, we propose a new masking strategy for MLM task, called L3Masking, that leverages lessons (specifically, token-wise likelihood in a context) learned from the vanilla language model to be fine-tuned.L3Masking actively masks tokens with low likelihood on the vanilla model.Experimental evaluations on text classification tasks in different domains confirms a multi-task text classification method with L3Masking performed task adaptation more effectively than that with RTM.These results suggest the usefulness of assigning a preference to the tokens to be learned as the task or domain adaptation.
Anthology ID:
2024.customnlp4u-1.6
Volume:
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Sachin Kumar, Vidhisha Balachandran, Chan Young Park, Weijia Shi, Shirley Anugrah Hayati, Yulia Tsvetkov, Noah Smith, Hannaneh Hajishirzi, Dongyeop Kang, David Jurgens
Venue:
CustomNLP4U
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–62
Language:
URL:
https://aclanthology.org/2024.customnlp4u-1.6
DOI:
Bibkey:
Cite (ACL):
Yusuke Kimura, Takahiro Komamizu, and Kenji Hatano. 2024. L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models. In Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 53–62, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models (Kimura et al., CustomNLP4U 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.customnlp4u-1.6.pdf