Daichi Mochihashi

2026

Statistical Semantic Change Detection via Usage Similarities
Taichi Aida | Daichi Mochihashi | Hiroya Takamura | Toshinobu Ogiso | Mamoru Komachi
The Proceedings for the 6th International Workshop on Computational Approaches to Language Change (LChange’26)

Semantic change detection comprises two subtasks: classification, which predicts whether a target word has undergone a semantic shift, and ranking, which orders words according to the degree of their semantic change. While most prior studies concentrated on ranking subtask, the classification subtask plays an equally important role, since many practical scenarios require a yes/no decision on semantic change rather than a global ranking. In this work, we propose a novel statistical method that predicts the presence or absence of semantic change. While most existing approaches infer semantic change by comparing word embeddings across time periods or domains, our method directly models the diachronic/synchronic consistency of usage-level similarity scores. Our experiments on SemEval-2020 Task 1 and WUGS datasets demonstrate that the proposed formulation outperforms existing state-of-the-art embedding-based methods, and robustly detects semantic change across languages in both diachronic and synchronic settings.

pdf bib abs

Cross-lingual and Word-Independent Methods for Quantifying Degree of Grammaticalization
Ryo Nagata | Daichi Mochihashi | Misato Ido | Yusuke Kubota | Naoki Otani | Yoshifumi Kawasaki | Hiroya Takamura
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Grammaticalization denotes a diachronic change of the grammatical category from content words to function words. One of the intensively explored directions in this area is to quantify the degree of grammaticalization. There have been a limited number of automated methods for this task and the existing, best-performing method is heavily language- and word-dependent. In this paper, we explore three methods for quantifying the degree of grammaticalization, which are applicable to a wider variety of words and languages. The difficulty here is that training data is not available in the present task. We overcome this difficulty by using Positive-Unlabeled learning (PU-learning) or Cross-Validation-like learning (hereafter, CV-learning). Experiments show that the CV-learning-based method achieves middle to high correlations to human judgments in English deverbal prepositions and Japanese nouns being grammaticalized. With this method, we further explore words possibly being grammaticalized and counterexamples of the unidirectionality hypothesis.

2025

pdf bib abs

The meanings and relationships of words shift over time. This phenomenon is referred to as semantic shift. Research focused on understanding how semantic shifts occur over multiple time periods is essential for gaining a detailed understanding of semantic shifts. However, detecting change points only between adjacent time periods is insufficient for analyzing detailed semantic shifts, and using BERT-based methods to examine word sense proportions incurs a high computational cost. To address those issues, we propose a simple yet intuitive framework for how semantic shifts occur over multiple time periods by utilizing similarity matrices based on word embeddings. We calculate diachronic word similarity matrices using fast and lightweight word embeddings across arbitrary time periods, making it deeper to analyze continuous semantic shifts. Additionally, by clustering the resulting similarity matrices, we can categorize words that exhibit similar behavior of semantic shift in an unsupervised manner.

2024

pdf bib abs

Learning Adverbs with Spectral Mixture Kernels
Tomoe Taniguchi | Daichi Mochihashi | Ichiro Kobayashi
Findings of the Association for Computational Linguistics: ACL 2024

For humans and robots to collaborate more in the real world, robots need to understand human intentions from the different manner of their behaviors. In our study, we focus on the meaning of adverbs which describe human motions. We propose a topic model, Hierarchical Dirichlet Process-Spectral Mixture Latent Dirichlet Allocation, which concurrently learns the relationship between those human motions and those adverbs by capturing the frequency kernels that represent motion characteristics and the shared topics of adverbs that depict such motions. We trained the model on datasets we made from movies about “walking” and “dancing”, and found that our model outperforms representative neural network models in terms of perplexity score. We also demonstrate our model’s ability to determine the adverbs for a given motion and confirmed that the model predicts more appropriate adverbs.

2023

pdf bib abs

Holographic CCG Parsing
Ryosuke Yamaki | Tadahiro Taniguchi | Daichi Mochihashi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a method for formulating CCG as a recursive composition in a continuous vector space. Recent CCG supertagging and parsing models generally demonstrate high performance, yet rely on black-box neural architectures to implicitly model phrase structure dependencies. Instead, we leverage the method of holographic embeddings as a compositional operator to explicitly model the dependencies between words and phrase structures in the embedding space. Experimental results revealed that holographic composition effectively improves the supertagging accuracy to achieve state-of-the-art parsing performance when using a C&C parser. The proposed span-based parsing algorithm using holographic composition achieves performance comparable to state-of-the-art neural parsing with Transformers. Furthermore, our model can semantically and syntactically infill text at the phrase level due to the decomposability of holographic composition.

pdf bib abs

Scale-Invariant Infinite Hierarchical Topic Model
Shusei Eshima | Daichi Mochihashi
Findings of the Association for Computational Linguistics: ACL 2023

Hierarchical topic models have been employed to organize a large number of diverse topics from corpora into a latent tree structure. However, existing models yield fragmented topics with overlapping themes whose expected probability becomes exponentially smaller along the depth of the tree. To solve this intrinsic problem, we propose a scale-invariant infinite hierarchical topic model (ihLDA). The ihLDA adaptively adjusts the topic creation to make the expected topic probability decay considerably slower than that in existing models. Thus, it facilitates the estimation of deeper topic structures encompassing diverse topics in a corpus. Furthermore, the ihLDA extends a widely used tree-structured prior (Adams et al., 2010) in a hierarchical Bayesian way, which enables drawing an infinite topic tree from the base tree while efficiently sampling the topic assignments for the words. Experiments demonstrate that the ihLDA has better topic uniqueness and hierarchical diversity thanexisting approaches, including state-of-the-art neural models.

2022

pdf bib abs

Infinite SCAN: An Infinite Model of Diachronic Semantic Change
Seiichi Inoue | Mamoru Komachi | Toshinobu Ogiso | Hiroya Takamura | Daichi Mochihashi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this study, we propose a Bayesian model that can jointly estimate the number of senses of words and their changes through time.The model combines a dynamic topic model on Gaussian Markov random fields with a logistic stick-breaking process that realizes Dirichlet process. In the experiments, we evaluated the proposed model in terms of interpretability, accuracy in estimating the number of senses, and tracking their changes using both artificial data and real data.We quantitatively verified that the model behaves as expected through evaluation using artificial data.Using the CCOHA corpus, we showed that our model outperforms the baseline model and investigated the semantic changes of several well-known target words.

2021

pdf bib

A Comprehensive Analysis of PMI-based Models for Measuring Semantic Differences
Taichi Aida | Mamoru Komachi | Toshinobu Ogiso | Hiroya Takamura | Daichi Mochihashi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

2020

pdf bib abs

How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text
Chihiro Shibata | Kei Uchiumi | Daichi Mochihashi
Proceedings of the 28th International Conference on Computational Linguistics

Long Short-Term Memory recurrent neural network (LSTM) is widely used and known to capture informative long-term syntactic dependencies. However, how such information are reflected in its internal vectors for natural text has not yet been sufficiently investigated. We analyze them by learning a language model where syntactic structures are implicitly given. We empirically show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values to help the language model to count the depth of nesting accurately, as Suzgun et al. (2019) recently show for synthetic Dyck languages. For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures, such as VP and NP. Moreover, with an L1 regularization, we also found that it can accurately predict whether a word is inside a phrase structure or not from a small number of components of the context vector. Even for the case of learning from raw text, context vectors are shown to still correlate well with the phrase structures. Finally, we show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.

2017

pdf bib abs

Nonparametric Bayesian Semi-supervised Word Segmentation
Ryo Fujii | Ryo Domoto | Daichi Mochihashi
Transactions of the Association for Computational Linguistics, Volume 5

This paper presents a novel hybrid generative/discriminative model of word segmentation based on nonparametric Bayesian methods. Unlike ordinary discriminative word segmentation which relies only on labeled data, our semi-supervised model also leverages a huge amounts of unlabeled text to automatically learn new “words”, and further constrains them by using a labeled data to segment non-standard texts such as those found in social networking services. Specifically, our hybrid model combines a discriminative classifier (CRF; Lafferty et al. (2001) and unsupervised word segmentation (NPYLM; Mochihashi et al. (2009)), with a transparent exchange of information between these two model structures within the semi-supervised framework (JESS-CM; Suzuki and Isozaki (2008)). We confirmed that it can appropriately segment non-standard texts like those in Twitter and Weibo and has nearly state-of-the-art accuracy on standard datasets in Japanese, Chinese, and Thai.

pdf bib abs

MIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting
Tomoyuki Kajiwara | Mamoru Komachi | Daichi Mochihashi
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a pointwise mutual information (PMI)-based approach to formalize paraphrasability and propose a variant of PMI, called MIPA, for the paraphrase acquisition. Our paraphrase acquisition method first acquires lexical paraphrase pairs by bilingual pivoting and then reranks them by PMI and distributional similarity. The complementary nature of information from bilingual corpora and from monolingual corpora makes the proposed method robust. Experimental results show that the proposed method substantially outperforms bilingual pivoting and distributional similarity themselves in terms of metrics such as MRR, MAP, coverage, and Spearman’s correlation.

pdf bib abs

Suggesting Sentences for ESL using Kernel Embeddings
Kent Shioda | Mamoru Komachi | Rue Ikeya | Daichi Mochihashi
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

Sentence retrieval is an important NLP application for English as a Second Language (ESL) learners. ESL learners are familiar with web search engines, but generic web search results may not be adequate for composing documents in a specific domain. However, if we build our own search system specialized to a domain, it may be subject to the data sparseness problem. Recently proposed word2vec partially addresses the data sparseness problem, but fails to extract sentences relevant to queries owing to the modeling of the latent intent of the query. Thus, we propose a method of retrieving example sentences using kernel embeddings and N-gram windows. This method implicitly models latent intent of query and sentences, and alleviates the problem of noisy alignment. Our results show that our method achieved higher precision in sentence retrieval for ESL in the domain of a university press release corpus, as compared to a previous unsupervised method used for a semantic textual similarity task.