Output Trend Analysis in Semantic Classification of Katakana Words Using a Large Language Model

Kazuki Kodaki, Minoru Sasaki


Abstract
In semantic classification of katakana words using a large language model (LLM), semantic divergences from the meanings of original English words such as Wasei-Eigo(Japanese-made English) may affect the accuracy of the model. In order to accurately capture the meaning of foreign words, we fine-tuned the LLM using data extracted from the BCCWJ(Balanced Corpus of Contemporary Written Japanese), analyzed the current accuracy and output trend of semantic classification for katakana words, and explored ways to improve the accuracy. The results of several experiments showed that fine-tuning was not effective for zero-shot learning, but in contrast, fine-tuning improved accuracy by about 10% for few-shot learning. Further analysis of the visualized data suggests trends related to words and meanings that the model struggles to classify correctly.
Anthology ID:
2025.ranlp-1.66
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
567–571
Language:
URL:
https://aclanthology.org/2025.ranlp-1.66/
DOI:
Bibkey:
Cite (ACL):
Kazuki Kodaki and Minoru Sasaki. 2025. Output Trend Analysis in Semantic Classification of Katakana Words Using a Large Language Model. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 567–571, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Output Trend Analysis in Semantic Classification of Katakana Words Using a Large Language Model (Kodaki & Sasaki, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.66.pdf