Kazuki Kodaki


2025

pdf bib
Output Trend Analysis in Semantic Classification of Katakana Words Using a Large Language Model
Kazuki Kodaki | Minoru Sasaki
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

In semantic classification of katakana words using a large language model (LLM), semantic divergences from the meanings of original English words such as Wasei-Eigo(Japanese-made English) may affect the accuracy of the model. In order to accurately capture the meaning of foreign words, we fine-tuned the LLM using data extracted from the BCCWJ(Balanced Corpus of Contemporary Written Japanese), analyzed the current accuracy and output trend of semantic classification for katakana words, and explored ways to improve the accuracy. The results of several experiments showed that fine-tuning was not effective for zero-shot learning, but in contrast, fine-tuning improved accuracy by about 10% for few-shot learning. Further analysis of the visualized data suggests trends related to words and meanings that the model struggles to classify correctly.