Kanako Komiya

2025

Large-Scale Japanese Metaphor Corpus Construction: Expanding BCCWJ-Metaphor with Automated Annotation
Hang Zhu | Rowan Hall Maudslay | Kanako Komiya | Sachi Kato | Masayuki Asahara
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

Structure Modeling Approach for UD Parsing of Historical Modern Japanese
Hiroaki Ozaki | Mai Omura | Kanako Komiya | Masayuki Asahara | Toshinobu Ogiso
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)

This study shows the effectiveness of structure modeling for transfer ability in diachronic syntactic parsing. The syntactic parsing for historical languages is significant from a humanities and quantitative linguistics perspective to enable annotation support and analysis on unannotated documents.We compared the zero-shot transfer ability between Transformer-based Biaffine UD parsers and our structure modeling approach. The structure modeling approach is a pipeline method consisting with dictionary-based morphological analysis (MeCab), a deep learning-based phrase (bunsetsu) analysis (Monaka), SVM-based phrase dependency parsing (CaboCha) and a rule-based conversion from phrase dependencies to UD.This pipeline closely follows the methodology used in constructing Japanese UD corpora.Experimental results showed that the structure modeling approach outperformed zero-shot transfer from the contemporary to the modern Japanese. Moreover, the structure modeling approach outperformed several existing UD parsers in contemporary Japanese. To this end, the structure modeling approach outperformed in the diachronic transfer of Japanese by a wide margin and was useful to those applications for digital humanities and quantitative linguistics.

2024

pdf bib abs

Long Unit Word Tokenization and Bunsetsu Segmentation of Historical Japanese
Hiroaki Ozaki | Kanako Komiya | Masayuki Asahara | Toshinobu Ogiso
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)

In Japanese, the natural minimal phrase of a sentence is the “bunsetsu” and it serves as a natural boundary of a sentence for native speakers rather than words, and thus grammatical analysis in Japanese linguistics commonly operates on the basis of bunsetsu units.In contrast, because Japanese does not have delimiters between words, there are two major categories of word definition, namely, Short Unit Words (SUWs) and Long Unit Words (LUWs).Though a SUW dictionary is available, LUW is not.Hence, this study focuses on providing deep learning-based (or LLM-based) bunsetsu and Long Unit Words analyzer for the Heian period (AD 794-1185) and evaluating its performances.We model the parser as transformer-based joint sequential labels model, which combine bunsetsu BI tag, LUW BI tag, and LUW Part-of-Speech (POS) tag for each SUW token.We train our models on corpora of each period including contemporary and historical Japanese.The results range from 0.976 to 0.996 in f1 value for both bunsetsu and LUW reconstruction indicating that our models achieve comparable performance with models for a contemporary Japanese corpus.Through the statistical analysis and diachronic case study, the estimation of bunsetsu could be influenced by the grammaticalization of morphemes.

Kanako Komiya

2025

2024

2023

2022

2020

2018

2017

2016

2015

2012

2011

2010

Co-authors

Venues