Koji Mineshima - ACL Anthology

Koji Mineshima

2026

Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason’s Selection Task
Hirohiko Abe | Kentaro Ozeki | Risako Ando | Takanobu Morishita | Koji Mineshima | Mitsuhiro Okada
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

As large language models (LLMs) advance in linguistic competence, their reasoning abilities are gaining increasing attention.In humans, reasoning often performs well in domain specific settings, particularly in normative rather than purely formal contexts.Although prior studies have compared LLM and human reasoning, the domain specificity of LLM reasoning remains underexplored.In this study, we introduce a new Wason Selection Task dataset that explicitly encodes deontic modality to systematically distinguish deontic from descriptive conditionals, and use it to examine LLMs’ conditional reasoning under deontic rules.We further analyze whether observed error patterns are better explained by confirmation bias (a tendency to seek rule-supporting evidence) or by matching bias (a tendency to ignore negation and select items that lexically match elements of the rule).Results show that, like humans, LLMs reason better with deontic rules and display matching-bias-like errors.Together, these findings suggest that the performance of LLMs varies systematically across rule types and that their error patterns can parallel well-known human biases in this paradigm.

2025

Is Partial Linguistic Information Sufficient for Discourse Connective Disambiguation? A Case Study of Concession
Takuma Sato | Ai Kubota | Koji Mineshima
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Discourse relations are sometimes explicitly conveyed by specific connectives.However, some connectives can signal multiple discourse relations; in such cases, disambiguation is necessary to determine which relation is intended.This task is known as *discourse connective disambiguation* (Pitler and Nenkova, 2009), and particular attention is often given to connectives that can convey both *concession* and other relations (e.g., *synchronous*).In this study, we conducted experiments to analyze which linguistic features play an important role in the disambiguation of polysemous connectives in Japanese.A neural language model (BERT) was fine-tuned using inputs from which specific linguistic features (e.g., word order, specific lexicon, etc.) had been removed.We analyzed which linguistic features affect disambiguation by comparing the model’s performance.Our results show that even after performing drastic removal, such as deleting one of the two arguments that constitute the discourse relation, the model’s performance remained relatively robust.However, the removal of certain lexical items or words belonging to specific lexical categories significantly degraded disambiguation performance, highlighting their importance in identifying the intended discourse relation.

Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives
Kentaro Ozeki | Risako Ando | Takanobu Morishita | Hirohiko Abe | Koji Mineshima | Mitsuhiro Okada
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Normative reasoning is a type of reasoning that involves normative or deontic modality, such as obligation and permission. While large language models (LLMs) have demonstrated remarkable performance across various reasoning tasks, their ability to handle normative reasoning remains underexplored. In this paper, we systematically evaluate LLMs’ reasoning capabilities in the normative domain from both logical and modal perspectives. Specifically, to assess how well LLMs reason with normative modals, we make a comparison between their reasoning with normative modals and their reasoning with epistemic modals, which share a common formal structure. To this end, we introduce a new dataset covering a wide range of formal patterns of reasoning in both normative and epistemic domains, while also incorporating non-formal cognitive factors that influence human reasoning. Our results indicate that, although LLMs generally adhere to valid reasoning patterns, they exhibit notable inconsistencies in specific types of normative reasoning and display cognitive biases similar to those observed in psychological studies of human reasoning. These findings highlight challenges in achieving logical consistency in LLMs’ normative reasoning and provide insights for enhancing their reliability. All data and code are released publicly at https://github.com/kmineshima/NeuBAROCO.

A Theorem-Proving-Based Evaluation of Neural Semantic Parsing
Hayate Funakura | Hyunsoo Kim | Koji Mineshima
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Graph-matching metrics such as Smatch are the de facto standard for evaluating neural semantic parsers, yet they capture surface overlap rather than logical equivalence. We reassess evaluation by pairing graph-matching with automated theorem proving. We compare two approaches to building parsers: supervised fine-tuning (T5-Small/Base) and few-shot in-context learning (GPT-4o/4.1/5), under normalized and unnormalized targets. We evaluate outputs using graph-matching, bidirectional entailment between source and target formulas with a first-order logic theorem prover, and well-formedness. Across settings, we find that models performing well on graph-matching often fail to produce logically equivalent formulas. Normalization reduces incidental target variability, improves well-formedness, and strengthens logical adequacy. Error analysis shows performance degrades with increasing formula complexity and with coordination, prepositional phrases, and passive voice; the dominant failures involve variable binding and indexing, and predicate naming. These findings highlight limits of graph-based metrics for reasoning-oriented applications and motivate logic-sensitive evaluation and training objectives together with simplified, normalized target representations. All data and code will be publicly released.

2024

Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset
Kentaro Ozeki | Risako Ando | Takanobu Morishita | Hirohiko Abe | Koji Mineshima | Mitsuhiro Okada
Findings of the Association for Computational Linguistics: ACL 2024

This paper explores the question of how accurately current large language models can perform logical reasoning in natural language, with an emphasis on whether these models exhibit reasoning biases similar to humans. Specifically, our study focuses on syllogistic reasoning, a form of deductive reasoning extensively studied in cognitive science as a natural form of human reasoning. We present a syllogism dataset called NeuBAROCO, which consists of syllogistic reasoning problems in English and Japanese. This dataset was originally designed for psychological experiments to assess human reasoning capabilities using various forms of syllogisms. Our experiments with leading large language models indicate that these models exhibit reasoning biases similar to humans, along with other error tendencies. Notably, there is significant room for improvement in reasoning problems where the relationship between premises and hypotheses is neither entailment nor contradiction. We also present experimental results and in-depth analysis using a new Chain-of-Thought prompting method, which asks LLMs to translate syllogisms into abstract logical expressions and then explain their reasoning process. Our analysis using this method suggests that the primary limitations of LLMs lie in the reasoning process itself rather than the interpretation of syllogisms.

Annotation of Japanese Discourse Relations Focusing on Concessive Inferences
Ai Kubota | Takuma Sato | Takayuki Amamoto | Ryota Akiyoshi | Koji Mineshima
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this study, we focus on the inference presupposed in the concessive discourse relation and present the discourse relation annotation for the Japanese connectives ‘nagara’ and ‘tsutsu’, both of which have two usages: Synchronous and Concession, just like English while. We also present the annotation for ‘tokorode’, which is ambiguous in three ways: Temporal, Location, and Concession. While corpora containing concessive discourse relations already exist, the distinctive feature of our study is that it aims to identify the concessive inferential relations by writing out the implicit presupposed inferences. In this paper, we report on the annotation methodology and its results, as well as the characteristics of concession that became apparent during annotation.

2023

Multi-purpose neural network for French categorial grammars
Gaëtan Margueritte | Daisuke Bekki | Koji Mineshima
Proceedings of the 15th International Conference on Computational Semantics

Categorial grammar (CG) is a lexicalized grammar formalism that can be used to identify and extract the semantics of natural language sentences. However, despite being used actively to solve natural language understanding tasks such as natural language inference or recognizing textual entailment, most of the tools exploiting the capacities of CG are available in a limited set of languages. This paper proposes a first step toward developing a set of tools enabling the use of CG for the French language by proposing a neural network tailored for part-of-speech and type-logical-grammar supertagging, located at the frontier between computational linguistics and artificial intelligence. Experiments show that our model can compete with state-of-the art models while retaining a simple architecture.

Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases
Risako Ando | Takanobu Morishita | Hirohiko Abe | Koji Mineshima | Mitsuhiro Okada
Proceedings of the 4th Natural Logic Meets Machine Learning Workshop

This paper investigates whether current large language models exhibit biases in logical reasoning, similar to humans. Specifically, we focus on syllogistic reasoning, a well-studied form of inference in the cognitive science of human deduction. To facilitate our analysis, we introduce a dataset called NeuBAROCO, originally designed for psychological experiments that assess human logical abilities in syllogistic reasoning. The dataset consists of syllogistic inferences in both English and Japanese. We examine three types of biases observed in human syllogistic reasoning: belief biases, conversion errors, and atmosphere effects. Our findings demonstrate that current large language models struggle more with problems involving these three types of biases.

Computational Semantics and Evaluation Benchmark for Interrogative Sentences via Combinatory Categorial Grammar
Hayate Funakura | Koji Mineshima
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

2022

Annotating Japanese Numeral Expressions for a Logical and Pragmatic Inference Dataset
Kana Koyano | Hitomi Yanaka | Koji Mineshima | Daisuke Bekki
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

Numeral expressions in Japanese are characterized by the flexibility of quantifier positions and the variety of numeral suffixes. However, little work has been done to build annotated corpora focusing on these features and datasets for testing the understanding of Japanese numeral expressions. In this study, we build a corpus that annotates each numeral expression in an existing phrase structure-based Japanese treebank with its usage and numeral suffix types. We also construct an inference test set for numerical expressions based on this annotated corpus. In this test set, we particularly pay attention to inferences where the correct label differs between logical entailment and implicature and those contexts such as negations and conditionals where the entailment labels can be reversed. The baseline experiment with Japanese BERT models shows that our inference test set poses challenges for inference involving various types of numeral expressions.

Compositional Evaluation on Japanese Textual Entailment and Similarity
Hitomi Yanaka | Koji Mineshima
Transactions of the Association for Computational Linguistics, Volume 10

Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.

2021

Assessing the Generalization Capacity of Pre-trained Language Models through Japanese Adversarial Natural Language Inference
Hitomi Yanaka | Koji Mineshima
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Despite the success of multilingual pre-trained language models, it remains unclear to what extent these models have human-like generalization capacity across languages. The aim of this study is to investigate the out-of-distribution generalization of pre-trained language models through Natural Language Inference (NLI) in Japanese, the typological properties of which are different from those of English. We introduce a synthetically generated Japanese NLI dataset, called the Japanese Adversarial NLI (JaNLI) dataset, which is inspired by the English HANS dataset and is designed to require understanding of Japanese linguistic phenomena and illuminate the vulnerabilities of models. Through a series of experiments to evaluate the generalization performance of both Japanese and multilingual BERT models, we demonstrate that there is much room to improve current models trained on Japanese NLI tasks. Furthermore, a comparison of human performance and model performance on the different types of garden-path sentences in the JaNLI dataset shows that structural phenomena that ease interpretation of garden-path sentences for human readers do not help models in the same way, highlighting a difference between human readers and the models.

Exploring Transitivity in Neural NLI Models through Veridicality
Hitomi Yanaka | Koji Mineshima | Kentaro Inui
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Despite the recent success of deep neural networks in natural language processing, the extent to which they can demonstrate human-like generalization capacities for natural language understanding remains unclear. We explore this issue in the domain of natural language inference (NLI), focusing on the transitivity of inference relations, a fundamental property for systematically drawing inferences. A model capturing transitivity can compose basic inference patterns and draw new inferences. We introduce an analysis method using synthetic and naturalistic NLI datasets involving clause-embedding verbs to evaluate whether models can perform transitivity inferences composed of veridical inferences and arbitrary inference types. We find that current NLI models do not perform consistently well on transitivity inference tasks, suggesting that they lack the generalization capacity for drawing composite inferences from provided training examples. The data and code for our analysis are publicly available at https://github.com/verypluming/transitivity.

SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics
Hitomi Yanaka | Koji Mineshima | Kentaro Inui
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki | Hitomi Yanaka | Koji Mineshima | Daisuke Bekki
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action labels, and 1,942 action triplets of the form (subject, predicate, object) that can be easily translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

Talking with the Theorem Prover to Interactively Solve Natural Language Inference
Atsushi Sumita | Yusuke Miyao | Koji Mineshima
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

2020

Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language?
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Despite the success of language models using neural networks, it remains unclear to what extent neural models have the generalization ability to perform inferences. In this paper, we introduce a method for evaluating whether neural models can learn systematicity of monotonicity inference in natural language, namely, the regularity for performing arbitrary inferences with generalization on composition. We consider four aspects of monotonicity inferences and test whether the models can systematically interpret lexical and logical phenomena on different training/test splits. A series of experiments show that three neural models systematically draw inferences on unseen combinations of lexical and logical phenomena when the syntactic structures of the sentences are similar between the training and test sets. However, the performance of the models significantly decreases when the structures are slightly changed in the test set while retaining all vocabularies and constituents already appearing in the training set. This indicates that the generalization ability of neural models is limited to cases where the syntactic structures are nearly the same as those in the training set.

Logical Inferences with Comparatives and Generalized Quantifiers
Izumi Haruta | Koji Mineshima | Daisuke Bekki
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Comparative constructions pose a challenge in Natural Language Inference (NLI), which is the task of determining whether a text entails a hypothesis. Comparatives are structurally complex in that they interact with other linguistic phenomena such as quantifiers, numerals, and lexical antonyms. In formal semantics, there is a rich body of work on comparatives and gradable expressions using the notion of degree. However, a logical inference system for comparatives has not been sufficiently developed for use in the NLI task. In this paper, we present a compositional semantics that maps various comparative constructions in English to semantic representations via Combinatory Categorial Grammar (CCG) parsers and combine it with an inference system based on automated theorem proving. We evaluate our system on three NLI datasets that contain complex logical inferences with comparatives, generalized quantifiers, and numerals. We show that the system outperforms previous logic-based systems as well as recent deep learning-based models.

Combining Event Semantics and Degree Semantics for Natural Language Inference
Izumi Haruta | Koji Mineshima | Daisuke Bekki
Proceedings of the 28th International Conference on Computational Linguistics

In formal semantics, there are two well-developed semantic frameworks: event semantics, which treats verbs and adverbial modifiers using the notion of event, and degree semantics, which analyzes adjectives and comparatives using the notion of degree. However, it is not obvious whether these frameworks can be combined to handle cases in which the phenomena in question are interacting with each other. Here, we study this issue by focusing on natural language inference (NLI). We implement a logic-based NLI system that combines event semantics and degree semantics and their interaction with lexical knowledge. We evaluate the system on various NLI datasets containing linguistically challenging problems. The results show that the system achieves high accuracies on these datasets in comparison with previous logic-based systems and deep-learning-based systems. This suggests that the two semantic frameworks can be combined consistently to handle various combinations of linguistic phenomena without compromising the advantage of either framework.

Development of a General-Purpose Categorial Grammar Treebank
Yusuke Kubota | Koji Mineshima | Noritsugu Hayashi | Shinya Okano
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces ABC Treebank, a general-purpose categorial grammar (CG) treebank for Japanese. It is ‘general-purpose’ in the sense that it is not tailored to a specific variant of CG, but rather aims to offer a theory-neutral linguistic resource (as much as possible) which can be converted to different versions of CG (specifically, CCG and Type-Logical Grammar) relatively easily. In terms of linguistic analysis, it improves over the existing Japanese CG treebank (Japanese CCGBank) on the treatment of certain linguistic phenomena (passives, causatives, and control/raising predicates) for which the lexical specification of the syntactic information reflecting local dependencies turns out to be crucial. In this paper, we describe the underlying ‘theory’ dubbed ABC Grammar that is taken as a basis for our treebank, outline the general construction of the corpus, and report on some preliminary results applying the treebank in a semantic parsing system for generating logical representations of sentences.

2019

Automatic Generation of High Quality CCGbanks for Parser Domain Adaptation
Masashi Yoshikawa | Hiroshi Noji | Koji Mineshima | Daisuke Bekki
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a new domain adaptation method for Combinatory Categorial Grammar (CCG) parsing, based on the idea of automatic generation of CCG corpora exploiting cheaper resources of dependency trees. Our solution is conceptually simple, and not relying on a specific parser architecture, making it applicable to the current best-performing parsers. We conduct extensive parsing experiments with detailed discussion; on top of existing benchmark datasets on (1) biomedical texts and (2) question sentences, we create experimental datasets of (3) speech conversation and (4) math problems. When applied to the proposed method, an off-the-shelf CCG parser shows significant performance gains, improving from 90.7% to 96.6% on speech conversation, and from 88.5% to 96.8% on math problems.

Multimodal Logical Inference System for Visual-Textual Entailment
Riko Suzuki | Hitomi Yanaka | Masashi Yoshikawa | Koji Mineshima | Daisuke Bekki
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.

HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui | Satoshi Sekine | Lasha Abzianidze | Johan Bos
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Large crowdsourced datasets are widely used for training and evaluating neural models on natural language inference (NLI). Despite these efforts, neural models have a hard time capturing logical inferences, including those licensed by phrase replacements, so-called monotonicity reasoning. Since no large dataset has been developed for monotonicity reasoning, it is still unclear whether the main obstacle is the size of datasets or the model architectures themselves. To investigate this issue, we introduce a new dataset, called HELP, for handling entailments with lexical and logical phenomena. We add it to training data for the state-of-the-art neural models and evaluate them on test sets for monotonicity phenomena. The results showed that our data augmentation improved the overall accuracy. We also find that the improvement is better on monotonicity inferences with lexical replacements than on downward inferences with disjunction and modification. This suggests that some types of inferences can be improved by our data augmentation while others are immune to it.

Underspecification and interpretive parallelism in Dependent Type Semantics
Yusuke Kubota | Koji Mineshima | Robert Levine | Daisuke Bekki
Proceedings of the IWCS 2019 Workshop on Computing Semantics with Types, Frames and Related Structures

Questions in Dependent Type Semantics
Kazuki Watanabe | Koji Mineshima | Daisuke Bekki
Proceedings of the Sixth Workshop on Natural Language and Computer Science

Dependent Type Semantics (DTS; Bekki and Mineshima, 2017) is a proof-theoretic compositional dynamic semantics based on Dependent Type Theory. The semantic representations for declarative sentences in DTS are types, based on the propositions-as-types paradigm. While type-theoretic semantics for natural language based on dependent type theory has been developed by many authors, how to assign semantic representations to interrogative sentences has been a non-trivial problem. In this study, we show how to provide the semantics of interrogative sentences in DTS. The basic idea is to assign the same type to both declarative sentences and interrogative sentences, partly building on the recent proposal in Inquisitive Semantics. We use Combinatory Categorial Grammar (CCG) as a syntactic component of DTS and implement our compositional semantics for interrogative sentences using ccg2lambda, a semantic parsing platform based on CCG. Based on the idea that the relationship between questions and answers can be formulated as the task of Recognizing Textual Entailment (RTE), we implement our inference system using proof assistant Coq and show that our system can deal with a wide range of question-answer relationships discussed in the formal semantics literature, including those with polar questions, alternative questions, and wh-questions.

Can Neural Networks Understand Monotonicity Reasoning?
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui | Satoshi Sekine | Lasha Abzianidze | Johan Bos
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way. To investigate this issue, we introduce the Monotonicity Entailment Dataset (MED). Performance by state-of-the-art NLI models on the new test set is substantially worse, under 55%, especially on downward reasoning. In addition, analysis using a monotonicity-driven data augmentation method showed that these models might be limited in their generalization ability in upward and downward reasoning.

2018

Acquisition of Phrase Correspondences Using Natural Deduction Proofs
Hitomi Yanaka | Koji Mineshima | Pascual Martínez-Gómez | Daisuke Bekki
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

How to identify, extract, and use phrasal knowledge is a crucial problem for the task of Recognizing Textual Entailment (RTE). To solve this problem, we propose a method for detecting paraphrases via natural deduction proofs of semantic relations between sentence pairs. Our solution relies on a graph reformulation of partial variable unifications and an algorithm that induces subgraph alignments between meaning representations. Experiments show that our method can automatically detect various paraphrases that are absent from existing paraphrase databases. In addition, the detection of paraphrases using proof information improves the accuracy of RTE tasks.

Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning
Masashi Yoshikawa | Koji Mineshima | Hiroshi Noji | Daisuke Bekki
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In formal logic-based approaches to Recognizing Textual Entailment (RTE), a Combinatory Categorial Grammar (CCG) parser is used to parse input premises and hypotheses to obtain their logical formulas. Here, it is important that the parser processes the sentences consistently; failing to recognize the similar syntactic structure results in inconsistent predicate argument structures among them, in which case the succeeding theorem proving is doomed to failure. In this work, we present a simple method to extend an existing CCG parser to parse a set of sentences consistently, which is achieved with an inter-sentence modeling with Markov Random Fields (MRF). When combined with existing logic-based systems, our method always shows improvement in the RTE experiments on English and Japanese languages.

Neural sentence generation from formal semantics
Kana Manome | Masashi Yoshikawa | Hitomi Yanaka | Pascual Martínez-Gómez | Koji Mineshima | Daisuke Bekki
Proceedings of the 11th International Conference on Natural Language Generation

Sequence-to-sequence models have shown strong performance in a wide range of NLP tasks, yet their applications to sentence generation from logical representations are underdeveloped. In this paper, we present a sequence-to-sequence model for generating sentences from logical meaning representations based on event semantics. We use a semantic parsing system based on Combinatory Categorial Grammar (CCG) to obtain data annotated with logical formulas. We augment our sequence-to-sequence model with masking for predicates to constrain output sentences. We also propose a novel evaluation method for generation using Recognizing Textual Entailment (RTE). Combining parsing and generation, we test whether or not the output sentence entails the original text and vice versa. Experiments showed that our model outperformed a baseline with respect to both BLEU scores and accuracies in RTE.

2017

Determining Semantic Textual Similarity using Natural Deduction Proofs
Hitomi Yanaka | Koji Mineshima | Pascual Martínez-Gómez | Daisuke Bekki
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Determining semantic textual similarity is a core research subject in natural language processing. Since vector-based models for sentence representation often use shallow information, capturing accurate semantics is difficult. By contrast, logical semantic representations capture deeper levels of sentence semantics, but their symbolic nature does not offer graded notions of textual similarity. We propose a method for determining semantic textual similarity by combining shallow features with features extracted from natural deduction proofs of bidirectional entailment relations between sentence pairs. For the natural deduction proofs, we use ccg2lambda, a higher-order automatic inference system, which converts Combinatory Categorial Grammar (CCG) derivation trees into semantic representations and conducts natural deduction proofs. Experiments show that our system was able to outperform other logic-based systems and that features derived from the proofs are effective for learning textual similarity.

Visual Denotations for Recognizing Textual Entailment
Dan Han | Pascual Martínez-Gómez | Koji Mineshima
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In the logic approach to Recognizing Textual Entailment, identifying phrase-to-phrase semantic relations is still an unsolved problem. Resources such as the Paraphrase Database offer limited coverage despite their large size whereas unsupervised distributional models of meaning often fail to recognize phrasal entailments. We propose to map phrases to their visual denotations and compare their meaning in terms of their images. We show that our approach is effective in the task of Recognizing Textual Entailment when combined with specific linguistic and logic features.

On-demand Injection of Lexical Knowledge for Recognising Textual Entailment
Pascual Martínez-Gómez | Koji Mineshima | Yusuke Miyao | Daisuke Bekki
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We approach the recognition of textual entailment using logical semantic representations and a theorem prover. In this setup, lexical divergences that preserve semantic entailment between the source and target texts need to be explicitly stated. However, recognising subsentential semantic relations is not trivial. We address this problem by monitoring the proof of the theorem and detecting unprovable sub-goals that share predicate arguments with logical premises. If a linguistic relation exists, then an appropriate axiom is constructed on-demand and the theorem proving continues. Experiments show that this approach is effective and precise, producing a system that outperforms other logic-based systems and is competitive with state-of-the-art statistical methods.

The Challenge of Composition in Distributional and Formal Semantics
Ran Tian | Koji Mineshima | Pascual Martínez-Gómez
Proceedings of the IJCNLP 2017, Tutorial Abstracts

This is tutorial proposal. Abstract is as follows: The principle of compositionality states that the meaning of a complete sentence must be explained in terms of the meanings of its subsentential parts; in other words, each syntactic operation should have a corresponding semantic operation. In recent years, it has been increasingly evident that distributional and formal semantics are complementary in addressing composition; while the distributional/vector-based approach can naturally measure semantic similarity (Mitchell and Lapata, 2010), the formal/symbolic approach has a long tradition within logic-based semantic frameworks (Montague, 1974) and can readily be connected to theorem provers or databases to perform complicated tasks. In this tutorial, we will cover recent efforts in extending word vectors to account for composition and reasoning, the various challenging phenomena observed in composition and addressed by formal semantics, and a hybrid approach that combines merits of the two. Outline and introduction to instructors are found in the submission. Ran Tian has taught a tutorial at the Annual Meeting of the Association for Natural Language Processing in Japan, 2015. The estimated audience size was about one hundred. Only a limited part of the contents in this tutorial is drawn from the previous one. Koji Mineshima has taught a one-week course at the 28th European Summer School in Logic, Language and Information (ESSLLI2016), together with Prof. Daisuke Bekki. Only a few contents are the same with this tutorial. Tutorials on “CCG Semantic Parsing” have been given in ACL2013, EMNLP2014, and AAAI2015. A coming tutorial on “Deep Learning for Semantic Composition” will be given in ACL2017. Contents in these tutorials are somehow related to but not overlapping with our proposal.

2016

Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser
Koji Mineshima | Ribeka Tanaka | Pascual Martínez-Gómez | Yusuke Miyao | Daisuke Bekki
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

ccg2lambda: A Compositional Semantics System
Pascual Martínez-Gómez | Koji Mineshima | Yusuke Miyao | Daisuke Bekki
Proceedings of ACL-2016 System Demonstrations

Annotation and Analysis of Discourse Relations, Temporal Relations and Multi-Layered Situational Relations in Japanese Texts
Kimi Kaneko | Saku Sugawara | Koji Mineshima | Daisuke Bekki
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper proposes a methodology for building a specialized Japanese data set for recognizing temporal relations and discourse relations. In addition to temporal and discourse relations, multi-layered situational relations that distinguish generic and specific states belonging to different layers in a discourse are annotated. Our methodology has been applied to 170 text fragments taken from Wikinews articles in Japanese. The validity of our methodology is evaluated and analyzed in terms of degree of annotator agreement and frequency of errors.

2015

Higher-order logical inference with compositional semantics
Koji Mineshima | Pascual Martínez-Gómez | Yusuke Miyao | Daisuke Bekki
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Co-authors

Venues