2024
pdf
bib
abs
L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models
Yusuke Kimura
|
Takahiro Komamizu
|
Kenji Hatano
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
When distributional differences exist between pre-training and fine-tuning data, language models (LMs) may perform poorly on downstream tasks.Recent studies have reported that multi-task learning of downstream task and masked language modeling (MLM) task during the fine-tuning phase improves the performance of the downstream task.Typical MLM tasks (e.g., random token masking (RTM)) tend not to care tokens corresponding to the knowledge already acquired during the pre-training phase, therefore LMs may not notice the important clue or not effective to acquire linguistic knowledge of the task or domain.To overcome this limitation, we propose a new masking strategy for MLM task, called L3Masking, that leverages lessons (specifically, token-wise likelihood in a context) learned from the vanilla language model to be fine-tuned.L3Masking actively masks tokens with low likelihood on the vanilla model.Experimental evaluations on text classification tasks in different domains confirms a multi-task text classification method with L3Masking performed task adaptation more effectively than that with RTM.These results suggest the usefulness of assigning a preference to the tokens to be learned as the task or domain adaptation.
pdf
bib
abs
Do LLMs Agree with Humans on Emotional Associations to Nonsense Words?
Yui Miyakawa
|
Chihaya Matsuhira
|
Hirotaka Kato
|
Takatsugu Hirayama
|
Takahiro Komamizu
|
Ichiro Ide
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Understanding human perception of nonsense words is helpful to devise product and character names that match their characteristics. Previous studies have suggested the usefulness of Large Language Models (LLMs) for estimating such human perception, but they did not focus on its emotional aspects. Hence, this study aims to elucidate the relationship of emotions evoked by nonsense words between humans and LLMs. Using a representative LLM, GPT-4, we reproduce the procedure of an existing study to analyze evoked emotions of humans for nonsense words. A positive correlation of 0.40 was found between the emotion intensity scores reproduced by GPT-4 and those manually annotated by humans. Although the correlation is not very high, this demonstrates that GPT-4 may agree with humans on emotional associations to nonsense words. Considering that the previous study reported that the correlation among human annotators was about 0.68 on average and that between a regression model trained on the annotations for real words and humans was 0.17, GPT-4’s agreement with humans is notably strong.
2023
pdf
bib
Discovering Phonesthemic Clusters in Readings of Kanji Characters toward Exploring Phonestheme in Japanese
Akira Yoshida
|
Chihaya Matsuhira
|
Hirotaka Kato
|
Takatsugu Hirayama
|
Takahiro Komamizu
|
Ichiro Ide
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
2021
pdf
bib
abs
Evaluation Scheme of Focal Translation for Japanese Partially Amended Statutes
Takahiro Yamakoshi
|
Takahiro Komamizu
|
Yasuhiro Ogawa
|
Katsuhiko Toyama
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
For updating the translations of Japanese statutes based on their amendments, we need to consider the translation “focality;” that is, we should only modify expressions that are relevant to the amendment and retain the others to avoid misconstruing its contents. In this paper, we introduce an evaluation metric and a corpus to improve focality evaluations. Our metric is called an Inclusive Score for DIfferential Translation: (ISDIT). ISDIT consists of two factors: (1) the n-gram recall of expressions unaffected by the amendment and (2) the n-gram precision of the output compared to the reference. This metric supersedes an existing one for focality by simultaneously calculating the translation quality of the changed expressions in addition to that of the unchanged expressions. We also newly compile a corpus for Japanese partially amendment translation that secures the focality of the post-amendment translations, while an existing evaluation corpus does not. With the metric and the corpus, we examine the performance of existing translation methods for Japanese partially amendment translations.