Yusuke Kubota
2026
Cross-lingual and Word-Independent Methods for Quantifying Degree of Grammaticalization
Ryo Nagata | Daichi Mochihashi | Misato Ido | Yusuke Kubota | Naoki Otani | Yoshifumi Kawasaki | Hiroya Takamura
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Ryo Nagata | Daichi Mochihashi | Misato Ido | Yusuke Kubota | Naoki Otani | Yoshifumi Kawasaki | Hiroya Takamura
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Grammaticalization denotes a diachronic change of the grammatical category from content words to function words. One of the intensively explored directions in this area is to quantify the degree of grammaticalization. There have been a limited number of automated methods for this task and the existing, best-performing method is heavily language- and word-dependent. In this paper, we explore three methods for quantifying the degree of grammaticalization, which are applicable to a wider variety of words and languages. The difficulty here is that training data is not available in the present task. We overcome this difficulty by using Positive-Unlabeled learning (PU-learning) or Cross-Validation-like learning (hereafter, CV-learning). Experiments show that the CV-learning-based method achieves middle to high correlations to human judgments in English deverbal prepositions and Japanese nouns being grammaticalized. With this method, we further explore words possibly being grammaticalized and counterexamples of the unidirectionality hypothesis.
2024
Is Structure Dependence Shaped for Efficient Communication?: A Case Study on Coordination
Kohei Kajikawa | Yusuke Kubota | Yohei Oseki
Proceedings of the 28th Conference on Computational Natural Language Learning
Kohei Kajikawa | Yusuke Kubota | Yohei Oseki
Proceedings of the 28th Conference on Computational Natural Language Learning
Natural language exhibits various universal properties.But why do these universals exist?One explanation is that they arise from functional pressures to achieve efficient communication, a view which attributes cross-linguistic properties to domain-general cognitive abilities.This hypothesis has successfully addressed some syntactic universal properties such as compositionality and Greenbergian word order universals.However, more abstract syntactic universals have not been explored from the perspective of efficient communication.Among such universals, the most notable one is structure dependence, that is, grammar-internal operations crucially depend on hierarchical representations.This property has traditionally been taken to be central to natural language and to involve domain-specific knowledge irreducible to communicative efficiency. In this paper, we challenge the conventional view by investigating whether structure dependence realizes efficient communication, focusing on coordinate structures.We design three types of artificial languages: (i) one with a structure-dependent reduction operation, which is similar to natural language, (ii) one without any reduction operations, and (iii) one with a linear (rather than structure-dependent) reduction operation.We quantify the communicative efficiency of these languages.The results demonstrate that the language with the structure-dependent reduction operation is significantly more communicatively efficient than the counterfactual languages.This suggests that the existence of structure-dependent properties can be explained from the perspective of efficient communication.
2020
Development of a General-Purpose Categorial Grammar Treebank
Yusuke Kubota | Koji Mineshima | Noritsugu Hayashi | Shinya Okano
Proceedings of the Twelfth Language Resources and Evaluation Conference
Yusuke Kubota | Koji Mineshima | Noritsugu Hayashi | Shinya Okano
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper introduces ABC Treebank, a general-purpose categorial grammar (CG) treebank for Japanese. It is ‘general-purpose’ in the sense that it is not tailored to a specific variant of CG, but rather aims to offer a theory-neutral linguistic resource (as much as possible) which can be converted to different versions of CG (specifically, CCG and Type-Logical Grammar) relatively easily. In terms of linguistic analysis, it improves over the existing Japanese CG treebank (Japanese CCGBank) on the treatment of certain linguistic phenomena (passives, causatives, and control/raising predicates) for which the lexical specification of the syntactic information reflecting local dependencies turns out to be crucial. In this paper, we describe the underlying ‘theory’ dubbed ABC Grammar that is taken as a basis for our treebank, outline the general construction of the corpus, and report on some preliminary results applying the treebank in a semantic parsing system for generating logical representations of sentences.
2019
Probing the nature of an island constraint with a parsed corpus
Yusuke Kubota | Ai Kubota
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing
Yusuke Kubota | Ai Kubota
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing
This paper presents a case study of the use of the NINJAL Parsed Corpus of Modern Japanese (NPCMJ) for syntactic research. NPCMJ is the first phrase structure-based treebank for Japanese that is specifically designed for application in linguistic (in addition to NLP) research. After discussing some basic methodological issues pertaining to the use of treebanks for theoretical linguistics research, we introduce our case study on the status of the Coordinate Structure Constraint (CSC) in Japanese, showing that NPCMJ enables us to easily retrieve examples that support one of the key claims of Kubota and Lee (2015): that the CSC should be viewed as a pragmatic, rather than a syntactic constraint. The corpus-based study we conducted moreover revealed a previously unnoticed tendency that was highly relevant for further clarifying the principles governing the empirical data in question. We conclude the paper by briefly discussing some further methodological issues brought up by our case study pertaining to the relationship between linguistic research and corpus development.
Underspecification and interpretive parallelism in Dependent Type Semantics
Yusuke Kubota | Koji Mineshima | Robert Levine | Daisuke Bekki
Proceedings of the IWCS 2019 Workshop on Computing Semantics with Types, Frames and Related Structures
Yusuke Kubota | Koji Mineshima | Robert Levine | Daisuke Bekki
Proceedings of the IWCS 2019 Workshop on Computing Semantics with Types, Frames and Related Structures