Kanako Komiya
2025
Large-Scale Japanese Metaphor Corpus Construction: Expanding BCCWJ-Metaphor with Automated Annotation
Hang Zhu | Rowan Hall Maudslay | Kanako Komiya | Sachi Kato | Masayuki Asahara
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation
Hang Zhu | Rowan Hall Maudslay | Kanako Komiya | Sachi Kato | Masayuki Asahara
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation
Structure Modeling Approach for UD Parsing of Historical Modern Japanese
Hiroaki Ozaki | Mai Omura | Kanako Komiya | Masayuki Asahara | Toshinobu Ogiso
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
Hiroaki Ozaki | Mai Omura | Kanako Komiya | Masayuki Asahara | Toshinobu Ogiso
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
This study shows the effectiveness of structure modeling for transfer ability in diachronic syntactic parsing. The syntactic parsing for historical languages is significant from a humanities and quantitative linguistics perspective to enable annotation support and analysis on unannotated documents.We compared the zero-shot transfer ability between Transformer-based Biaffine UD parsers and our structure modeling approach. The structure modeling approach is a pipeline method consisting with dictionary-based morphological analysis (MeCab), a deep learning-based phrase (bunsetsu) analysis (Monaka), SVM-based phrase dependency parsing (CaboCha) and a rule-based conversion from phrase dependencies to UD.This pipeline closely follows the methodology used in constructing Japanese UD corpora.Experimental results showed that the structure modeling approach outperformed zero-shot transfer from the contemporary to the modern Japanese. Moreover, the structure modeling approach outperformed several existing UD parsers in contemporary Japanese. To this end, the structure modeling approach outperformed in the diachronic transfer of Japanese by a wide margin and was useful to those applications for digital humanities and quantitative linguistics.
2024
Long Unit Word Tokenization and Bunsetsu Segmentation of Historical Japanese
Hiroaki Ozaki | Kanako Komiya | Masayuki Asahara | Toshinobu Ogiso
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
Hiroaki Ozaki | Kanako Komiya | Masayuki Asahara | Toshinobu Ogiso
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
In Japanese, the natural minimal phrase of a sentence is the “bunsetsu” and it serves as a natural boundary of a sentence for native speakers rather than words, and thus grammatical analysis in Japanese linguistics commonly operates on the basis of bunsetsu units.In contrast, because Japanese does not have delimiters between words, there are two major categories of word definition, namely, Short Unit Words (SUWs) and Long Unit Words (LUWs).Though a SUW dictionary is available, LUW is not.Hence, this study focuses on providing deep learning-based (or LLM-based) bunsetsu and Long Unit Words analyzer for the Heian period (AD 794-1185) and evaluating its performances.We model the parser as transformer-based joint sequential labels model, which combine bunsetsu BI tag, LUW BI tag, and LUW Part-of-Speech (POS) tag for each SUW token.We train our models on corpora of each period including contemporary and historical Japanese.The results range from 0.976 to 0.996 in f1 value for both bunsetsu and LUW reconstruction indicating that our models achieve comparable performance with models for a contemporary Japanese corpus.Through the statistical analysis and diachronic case study, the estimation of bunsetsu could be influenced by the grammaticalization of morphemes.
Analysis of cross-linguality of XL-WSD dataset: A comparative study of Japanese and Dutch
Naranbuuvei Ganbat | Soma Asada | Kanako Komiya
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
Naranbuuvei Ganbat | Soma Asada | Kanako Komiya
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
2023
Translation from Historical to Contemporary Japanese Using Japanese T5
Hisao Usui | Kanako Komiya
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
Hisao Usui | Kanako Komiya
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
This paper presents machine translation from historical Japanese to contemporary Japanese using a Text-to-Text Transfer Transformer (T5). The result of the previous study that used neural machine translation (NMT), Long Short Term Memory (LSTM), could not outperform that of the work that used statistical machine translation (SMT). Because an NMT model tends to require more training data than an SMT model, the lack of parallel data of historical and contemporary Japanese could be the reason. Therefore, we used Japanese T5, a kind of large language model to compensate for the lack of data. Our experiments show that the translation with T5 is slightly lower than SMT. In addition, we added the title of the literature book from which the example sentence was extracted at the beginning of the input. Japanese historical corpus consists of a variety of texts ranging in periods when the texts were written and the writing styles. Therefore, we expected that the title gives information about the period and style, to the translation model. Additional experiments revealed that, with title information, the translation from historical Japanese to contemporary Japanese with T5 surpassed that with SMT.
All-Words Word Sense Disambiguation for Historical Japanese
Soma Asada | Kanako Komiya | Masayuki Asahara
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
Soma Asada | Kanako Komiya | Masayuki Asahara
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
2022
Reputation Analysis Using Key Phrases and Sentiment Scores Extracted from Reviews
Yipu Huang | Minoru Sasaki | Kanako Komiya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation
Yipu Huang | Minoru Sasaki | Kanako Komiya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation
Word Sense Disambiguation of Corpus of Historical Japanese Using Japanese BERT Trained with Contemporary Texts
Kanako Komiya | Nagi Oki | Masayuki Asahara
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation
Kanako Komiya | Nagi Oki | Masayuki Asahara
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation
2020
Automatic Creation of Correspondence Table of Meaning Tags from Two Dictionaries in One Language Using Bilingual Word Embedding
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 13th Workshop on Building and Using Comparable Corpora
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 13th Workshop on Building and Using Comparable Corpora
In this paper, we show how to use bilingual word embeddings (BWE) to automatically create a corresponding table of meaning tags from two dictionaries in one language and examine the effectiveness of the method. To do this, we had a problem: the meaning tags do not always correspond one-to-one because the granularities of the word senses and the concepts are different from each other. Therefore, we regarded the concept tag that corresponds to a word sense the most as the correct concept tag corresponding the word sense. We used two BWE methods, a linear transformation matrix and VecMap. We evaluated the most frequent sense (MFS) method and the corpus concatenation method for comparison. The accuracies of the proposed methods were higher than the accuracy of the random baseline but lower than those of the MFS and corpus concatenation methods. However, because our method utilized the embedding vectors of the word senses, the relations of the sense tags corresponding to concept tags could be examined by mapping the sense embeddings to the vector space of the concept tags. Also, our methods could be performed when we have only concept or word sense embeddings whereas the MFS method requires a parallel corpus and the corpus concatenation method needs two tagged corpora.
Generation and Evaluation of Concept Embeddings Via Fine-Tuning Using Automatically Tagged Corpus
Kanako Komiya | Daiki Yaginuma | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Kanako Komiya | Daiki Yaginuma | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Composing Word Vectors for Japanese Compound Words Using Bilingual Word Embeddings
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Neural Machine Translation from Historical Japanese to Contemporary Japanese Using Diachronically Domain-Adapted Word Embeddings
Masashi Takaku | Tosho Hirasawa | Mamoru Komachi | Kanako Komiya
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Masashi Takaku | Tosho Hirasawa | Mamoru Komachi | Kanako Komiya
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
2018
All-words Word Sense Disambiguation Using Concept Embeddings
Rui Suzuki | Kanako Komiya | Masayuki Asahara | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Rui Suzuki | Kanako Komiya | Masayuki Asahara | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus
Kanako Komiya | Hiroyuki Shinnou
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Kanako Komiya | Hiroyuki Shinnou
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Fine-tuning is a popular method to achieve better performance when only a small target corpus is available. However, it requires tuning of a number of metaparameters and thus it might carry risk of adverse effect when inappropriate metaparameters are used. Therefore, we investigate effective parameters for fine-tuning when only a small target corpus is available. In the current study, we target at improving Japanese word embeddings created from a huge corpus. First, we demonstrate that even the word embeddings created from the huge corpus are affected by domain shift. After that, we investigate effective parameters for fine-tuning of the word embeddings using a small target corpus. We used perplexity of a language model obtained from a Long Short-Term Memory network to assess the word embeddings input into the network. The experiments revealed that fine-tuning sometimes give adverse effect when only a small target corpus is used and batch size is the most important parameter for fine-tuning. In addition, we confirmed that effect of fine-tuning is higher when size of a target corpus was larger.
Domain Adaptation for Sentiment Analysis using Keywords in the Target Domain as the Learning Weight
Jing Bai | Hiroyuki Shinnou | Kanako Komiya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Jing Bai | Hiroyuki Shinnou | Kanako Komiya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Domain Adaptation Using a Combination of Multiple Embeddings for Sentiment Analysis
Hiroyuki Shinnou | Xinyu Zhao | Kanako Komiya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Hiroyuki Shinnou | Xinyu Zhao | Kanako Komiya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Fine-tuning for Named Entity Recognition Using Part-of-Speech Tagging
Masaya Suzuki | Kanako Komiya | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Masaya Suzuki | Kanako Komiya | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
2017
Japanese all-words WSD system using the Kyoto Text Analysis ToolKit
Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki | Shinsuke Mori
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki | Shinsuke Mori
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
2016
Comparison of Annotating Methods for Named Entity Corpora
Kanako Komiya | Masaya Suzuki | Tomoya Iwakura | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)
Kanako Komiya | Masaya Suzuki | Tomoya Iwakura | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)
Constructing a Japanese Basic Named Entity Corpus of Various Genres
Tomoya Iwakura | Kanako Komiya | Ryuichi Tachibana
Proceedings of the Sixth Named Entity Workshop
Tomoya Iwakura | Kanako Komiya | Ryuichi Tachibana
Proceedings of the Sixth Named Entity Workshop
Supervised Word Sense Disambiguation with Sentences Similarities from Context Word Embeddings
Shoma Yamaki | Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers
Shoma Yamaki | Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers
2015
Domain Adaptation with Filtering for Named Entity Extraction of Japanese Anime-Related Words
Kanako Komiya | Daichi Edamura | Ryuta Tamura | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the International Conference Recent Advances in Natural Language Processing
Kanako Komiya | Daichi Edamura | Ryuta Tamura | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the International Conference Recent Advances in Natural Language Processing
Surrounding Word Sense Model for Japanese All-words Word Sense Disambiguation
Kanako Komiya | Yuto Sasaki | Hajime Morita | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
Kanako Komiya | Yuto Sasaki | Hajime Morita | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
Hybrid Method of Semi-supervised Learning and Feature Weighted Learning for Domain Adaptation of Document Classification
Hiroyuki Shinnou | Liying Xiao | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
Hiroyuki Shinnou | Liying Xiao | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
Learning under Covariate Shift for Domain Adaptation for Word Sense Disambiguation
Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters
Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters
Unsupervised Domain Adaptation for Word Sense Disambiguation using Stacked Denoising Autoencoder
Kazuhei Kouno | Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters
Kazuhei Kouno | Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters
2012
Automatic Domain Adaptation for Word Sense Disambiguation Based on Comparison of Multiple Classifiers
Kanako Komiya | Manabu Okumura
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation
Kanako Komiya | Manabu Okumura
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation
The Transliteration from Alphabet Queries to Japanese Product Names
Rieko Tsuji | Yoshinori Nemoto | Wimvipa Luangpiensamut | Yuji Abe | Takeshi Kimura | Kanako Komiya | Koji Fujimoto | Yoshiyuki Kotani
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation
Rieko Tsuji | Yoshinori Nemoto | Wimvipa Luangpiensamut | Yuji Abe | Takeshi Kimura | Kanako Komiya | Koji Fujimoto | Yoshiyuki Kotani
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation
2011
Automatic Determination of a Domain Adaptation Method for Word Sense Disambiguation Using Decision Tree Learning
Kanako Komiya | Manabu Okumura
Proceedings of 5th International Joint Conference on Natural Language Processing
Kanako Komiya | Manabu Okumura
Proceedings of 5th International Joint Conference on Natural Language Processing
Negation Naive Bayes for Categorization of Product Pages on the Web
Kanako Komiya | Naoto Sato | Koji Fujimoto | Yoshiyuki Kotani
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
Kanako Komiya | Naoto Sato | Koji Fujimoto | Yoshiyuki Kotani
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
2010
Search
Fix author
Co-authors
- Hiroyuki Shinnou 16
- Minoru Sasaki 11
- Masayuki Asahara 9
- Yoshiyuki Kotani 4
- Manabu Okumura 3
- Soma Asada 2
- Koji Fujimoto 2
- Teruo Hirabayashi 2
- Tomoya Iwakura 2
- Toshinobu Ogiso 2
- Hiroaki Ozaki 2
- Masaya Suzuki 2
- Yuji Abe 1
- Jing Bai 1
- Daichi Edamura 1
- Naranbuuvei Ganbat 1
- Tosho Hirasawa 1
- Yipu Huang 1
- Sachi Kato 1
- Takeshi Kimura 1
- Mamoru Komachi 1
- Kazuhei Kouno 1
- Wimvipa Luangpiensamut 1
- Rowan Hall Maudslay 1
- Shinsuke Mori 1
- Hajime Morita 1
- Yoshinori Nemoto 1
- Nagi Oki 1
- Mai Omura 1
- Yuto Sasaki 1
- Naoto Sato 1
- Kiyoaki Shirai 1
- Rui Suzuki 1
- Ryuichi Tachibana 1
- Masashi Takaku 1
- Ryuta Tamura 1
- Rieko Tsuji 1
- Hisao Usui 1
- Liying Xiao 1
- Daiki Yaginuma 1
- Shoma Yamaki 1
- Hikaru Yokono 1
- Xinyu Zhao 1
- Hang Zhu 1