Hitomi Yanaka


2023

pdf bib
Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating Generalization Capacity of Language Models
Tomoki Sugimoto | Yasumasa Onoe | Hitomi Yanaka
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Natural Language Inference (NLI) tasks involving temporal inference remain challenging for pre-trained language models (LMs). Although various datasets have been created for this task, they primarily focus on English and do not address the need for resources in other languages. It is unclear whether current LMs realize the generalization capacity for temporal inference across languages. In this paper, we present Jamp, a Japanese NLI benchmark focused on temporal inference. Our dataset includes a range of temporal inference patterns, which enables us to conduct fine-grained analysis. To begin the data annotation process, we create diverse inference templates based on the formal semantics test suites. We then automatically generate diverse NLI examples by using the Japanese case frame dictionary and well-designed templates while controlling the distribution of inference patterns and gold labels. We evaluate the generalization capacities of monolingual/multilingual LMs by splitting our dataset based on tense fragments (i.e., temporal inference patterns). Our findings demonstrate that LMs struggle with specific linguistic phenomena, such as habituality, indicating that there is potential for the development of more effective NLI models across languages.

pdf bib
Constructing Multilingual Code Search Dataset Using Neural Machine Translation
Ryo Sekizawa | Nan Duan | Shuai Lu | Hitomi Yanaka
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only in English. In this research, we create a multilingual code search dataset in four natural and four programming languages using a neural machine translation model. Using our dataset, we pre-train and fine-tune the Transformer-based models and then evaluate them on multiple code search test sets. Our results show that the model pre-trained with all natural and programming language data has performed best in most cases. By applying back-translation data filtering to our dataset, we demonstrate that the translation quality affects the model’s performance to a certain extent, but the data size matters more.

pdf bib
Analyzing Syntactic Generalization Capacity of Pre-trained Language Models on Japanese Honorific Conversion
Ryo Sekizawa | Hitomi Yanaka
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Using Japanese honorifics is challenging because it requires not only knowledge of the grammatical rules but also contextual information, such as social relationships. It remains unclear whether pre-trained large language models (LLMs) can flexibly handle Japanese honorifics like humans. To analyze this, we introduce an honorific conversion task that considers social relationships among people mentioned in a conversation. We construct a Japanese honorifics dataset from problem templates of various sentence structures to investigate the syntactic generalization capacity of GPT-3, one of the leading LLMs, on this task under two settings: fine-tuning and prompt learning. Our results showed that the fine-tuned GPT-3 performed better in a context-aware honorific conversion task than the prompt-based one. The fine-tuned model demonstrated overall syntactic generalizability towards compound honorific sentences, except when tested with the data involving direct speech.

pdf bib
Does Character-level Information Always Improve DRS-based Semantic Parsing?
Tomoya Kurosawa | Hitomi Yanaka
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Even in the era of massive language models, it has been suggested that character-level representations improve the performance of neural models. The state-of-the-art neural semantic parser for Discourse Representation Structures uses character-level representations, improving performance in the four languages (i.e., English, German, Dutch, and Italian) in the Parallel Meaning Bank dataset. However, how and why character-level information improves the parser’s performance remains unclear. This study provides an in-depth analysis of performance changes by order of character sequences. In the experiments, we compare F1-scores by shuffling the order and randomizing character sequences after testing the performance of character-level information. Our results indicate that incorporating character-level information does not improve the performance in English and German. In addition, we find that the parser is not sensitive to correct character order in Dutch. Nevertheless, performance improvements are observed when using character-level information.

pdf bib
Is Japanese CCGBank empirically correct? A case study of passive and causative constructions
Daisuke Bekki | Hitomi Yanaka
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)

The Japanese CCGBank serves as training and evaluation data for developing Japanese CCG parsers. However, since it is automatically generated from the Kyoto Corpus, a dependency treebank, its linguistic validity still needs to be sufficiently verified. In this paper, we focus on the analysis of passive/causative constructions in the Japanese CCGBank and show that, together with the compositional semantics of ccg2lambda, a semantic parsing system, it yields empirically wrong predictions for the nested construction of passives and causatives.

pdf bib
Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models
Hitomi Yanaka | Yuta Nakamura | Yuki Chida | Tomoya Kurosawa
Proceedings of the 5th Clinical Natural Language Processing Workshop

Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis. We provide a visual reasoning dataset focusing on numerical understanding in the medical domain. The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain. However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.

pdf bib
Knowledge Injection for Disease Names in Logical Inference between Japanese Clinical Texts
Natsuki Murakami | Mana Ishida | Yuta Takahashi | Hitomi Yanaka | Daisuke Bekki
Proceedings of the 5th Clinical Natural Language Processing Workshop

In the medical field, there are many clinical texts such as electronic medical records, and research on Japanese natural language processing using these texts has been conducted. One such research involves Recognizing Textual Entailment (RTE) in clinical texts using a semantic analysis and logical inference system, ccg2lambda. However, it is difficult for existing inference systems to correctly determine the entailment relations , if the input sentence contains medical domain specific paraphrases such as disease names. In this study, we propose a method to supplement the equivalence relations of disease names as axioms by identifying candidates for paraphrases that lack in theorem proving. Candidates of paraphrases are identified by using a model for the NER task for disease names and a disease name dictionary. We also construct an inference test set that requires knowledge injection of disease names and evaluate our inference system. Experiments showed that our inference system was able to correctly infer for 106 out of 149 inference test sets.

2022

pdf bib
Annotating Japanese Numeral Expressions for a Logical and Pragmatic Inference Dataset
Kana Koyano | Hitomi Yanaka | Koji Mineshima | Daisuke Bekki
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

Numeral expressions in Japanese are characterized by the flexibility of quantifier positions and the variety of numeral suffixes. However, little work has been done to build annotated corpora focusing on these features and datasets for testing the understanding of Japanese numeral expressions. In this study, we build a corpus that annotates each numeral expression in an existing phrase structure-based Japanese treebank with its usage and numeral suffix types. We also construct an inference test set for numerical expressions based on this annotated corpus. In this test set, we particularly pay attention to inferences where the correct label differs between logical entailment and implicature and those contexts such as negations and conditionals where the entailment labels can be reversed. The baseline experiment with Japanese BERT models shows that our inference test set poses challenges for inference involving various types of numeral expressions.

pdf bib
Logical Inference for Counting on Semi-structured Tables
Tomoya Kurosawa | Hitomi Yanaka
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Recently, the Natural Language Inference (NLI) task has been studied for semi-structured tables that do not have a strict format. Although neural approaches have achieved high performance in various types of NLI, including NLI between semi-structured tables and texts, they still have difficulty in performing a numerical type of inference, such as counting. To handle a numerical type of inference, we propose a logical inference system for reasoning between semi-structured tables and texts. We use logical representations as meaning representations for tables and texts and use model checking to handle a numerical type of inference between texts and tables. To evaluate the extent to which our system can perform inference with numerical comparatives, we make an evaluation protocol that focuses on numerical understanding between semi-structured tables and texts in English. We show that our system can more robustly perform inference between tables and texts that requires numerical understanding compared with current neural approaches.

pdf bib
Compositional Semantics and Inference System for Temporal Order based on Japanese CCG
Tomoki Sugimoto | Hitomi Yanaka
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Natural Language Inference (NLI) is the task of determining whether a premise entails a hypothesis. NLI with temporal order is a challenging task because tense and aspect are complex linguistic phenomena involving interactions with temporal adverbs and temporal connectives. To tackle this, temporal and aspectual inference has been analyzed in various ways in the field of formal semantics. However, a Japanese NLI system for temporal order based on the analysis of formal semantics has not been sufficiently developed. We present a logic-based NLI system that considers temporal order in Japanese based on compositional semantics via Combinatory Categorial Grammar (CCG) syntactic analysis. Our system performs inference involving temporal order by using axioms for temporal relations and automated theorem provers. We evaluate our system by experimenting with Japanese NLI datasets that involve temporal order. We show that our system outperforms previous logic-based systems as well as current deep learning-based models.

pdf bib
Compositional Evaluation on Japanese Textual Entailment and Similarity
Hitomi Yanaka | Koji Mineshima
Transactions of the Association for Computational Linguistics, Volume 10

Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.

2021

pdf bib
Exploring Transitivity in Neural NLI Models through Veridicality
Hitomi Yanaka | Koji Mineshima | Kentaro Inui
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Despite the recent success of deep neural networks in natural language processing, the extent to which they can demonstrate human-like generalization capacities for natural language understanding remains unclear. We explore this issue in the domain of natural language inference (NLI), focusing on the transitivity of inference relations, a fundamental property for systematically drawing inferences. A model capturing transitivity can compose basic inference patterns and draw new inferences. We introduce an analysis method using synthetic and naturalistic NLI datasets involving clause-embedding verbs to evaluate whether models can perform transitivity inferences composed of veridical inferences and arbitrary inference types. We find that current NLI models do not perform consistently well on transitivity inference tasks, suggesting that they lack the generalization capacity for drawing composite inferences from provided training examples. The data and code for our analysis are publicly available at https://github.com/verypluming/transitivity.

pdf bib
Assessing the Generalization Capacity of Pre-trained Language Models through Japanese Adversarial Natural Language Inference
Hitomi Yanaka | Koji Mineshima
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Despite the success of multilingual pre-trained language models, it remains unclear to what extent these models have human-like generalization capacity across languages. The aim of this study is to investigate the out-of-distribution generalization of pre-trained language models through Natural Language Inference (NLI) in Japanese, the typological properties of which are different from those of English. We introduce a synthetically generated Japanese NLI dataset, called the Japanese Adversarial NLI (JaNLI) dataset, which is inspired by the English HANS dataset and is designed to require understanding of Japanese linguistic phenomena and illuminate the vulnerabilities of models. Through a series of experiments to evaluate the generalization performance of both Japanese and multilingual BERT models, we demonstrate that there is much room to improve current models trained on Japanese NLI tasks. Furthermore, a comparison of human performance and model performance on the different types of garden-path sentences in the JaNLI dataset shows that structural phenomena that ease interpretation of garden-path sentences for human readers do not help models in the same way, highlighting a difference between human readers and the models.

pdf bib
SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics
Hitomi Yanaka | Koji Mineshima | Kentaro Inui
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Do Grammatical Error Correction Models Realize Grammatical Generalization?
Masato Mita | Hitomi Yanaka
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki | Hitomi Yanaka | Koji Mineshima | Daisuke Bekki
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action labels, and 1,942 action triplets of the form (subject, predicate, object) that can be easily translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

2020

pdf bib
Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language?
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Despite the success of language models using neural networks, it remains unclear to what extent neural models have the generalization ability to perform inferences. In this paper, we introduce a method for evaluating whether neural models can learn systematicity of monotonicity inference in natural language, namely, the regularity for performing arbitrary inferences with generalization on composition. We consider four aspects of monotonicity inferences and test whether the models can systematically interpret lexical and logical phenomena on different training/test splits. A series of experiments show that three neural models systematically draw inferences on unseen combinations of lexical and logical phenomena when the syntactic structures of the sentences are similar between the training and test sets. However, the performance of the models significantly decreases when the structures are slightly changed in the test set while retaining all vocabularies and constituents already appearing in the training set. This indicates that the generalization ability of neural models is limited to cases where the syntactic structures are nearly the same as those in the training set.

2019

pdf bib
Multimodal Logical Inference System for Visual-Textual Entailment
Riko Suzuki | Hitomi Yanaka | Masashi Yoshikawa | Koji Mineshima | Daisuke Bekki
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.

pdf bib
Can Neural Networks Understand Monotonicity Reasoning?
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui | Satoshi Sekine | Lasha Abzianidze | Johan Bos
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way. To investigate this issue, we introduce the Monotonicity Entailment Dataset (MED). Performance by state-of-the-art NLI models on the new test set is substantially worse, under 55%, especially on downward reasoning. In addition, analysis using a monotonicity-driven data augmentation method showed that these models might be limited in their generalization ability in upward and downward reasoning.

pdf bib
HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui | Satoshi Sekine | Lasha Abzianidze | Johan Bos
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Large crowdsourced datasets are widely used for training and evaluating neural models on natural language inference (NLI). Despite these efforts, neural models have a hard time capturing logical inferences, including those licensed by phrase replacements, so-called monotonicity reasoning. Since no large dataset has been developed for monotonicity reasoning, it is still unclear whether the main obstacle is the size of datasets or the model architectures themselves. To investigate this issue, we introduce a new dataset, called HELP, for handling entailments with lexical and logical phenomena. We add it to training data for the state-of-the-art neural models and evaluate them on test sets for monotonicity phenomena. The results showed that our data augmentation improved the overall accuracy. We also find that the improvement is better on monotonicity inferences with lexical replacements than on downward inferences with disjunction and modification. This suggests that some types of inferences can be improved by our data augmentation while others are immune to it.

2018

pdf bib
Neural sentence generation from formal semantics
Kana Manome | Masashi Yoshikawa | Hitomi Yanaka | Pascual Martínez-Gómez | Koji Mineshima | Daisuke Bekki
Proceedings of the 11th International Conference on Natural Language Generation

Sequence-to-sequence models have shown strong performance in a wide range of NLP tasks, yet their applications to sentence generation from logical representations are underdeveloped. In this paper, we present a sequence-to-sequence model for generating sentences from logical meaning representations based on event semantics. We use a semantic parsing system based on Combinatory Categorial Grammar (CCG) to obtain data annotated with logical formulas. We augment our sequence-to-sequence model with masking for predicates to constrain output sentences. We also propose a novel evaluation method for generation using Recognizing Textual Entailment (RTE). Combining parsing and generation, we test whether or not the output sentence entails the original text and vice versa. Experiments showed that our model outperformed a baseline with respect to both BLEU scores and accuracies in RTE.

pdf bib
Acquisition of Phrase Correspondences Using Natural Deduction Proofs
Hitomi Yanaka | Koji Mineshima | Pascual Martínez-Gómez | Daisuke Bekki
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

How to identify, extract, and use phrasal knowledge is a crucial problem for the task of Recognizing Textual Entailment (RTE). To solve this problem, we propose a method for detecting paraphrases via natural deduction proofs of semantic relations between sentence pairs. Our solution relies on a graph reformulation of partial variable unifications and an algorithm that induces subgraph alignments between meaning representations. Experiments show that our method can automatically detect various paraphrases that are absent from existing paraphrase databases. In addition, the detection of paraphrases using proof information improves the accuracy of RTE tasks.

2017

pdf bib
Determining Semantic Textual Similarity using Natural Deduction Proofs
Hitomi Yanaka | Koji Mineshima | Pascual Martínez-Gómez | Daisuke Bekki
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Determining semantic textual similarity is a core research subject in natural language processing. Since vector-based models for sentence representation often use shallow information, capturing accurate semantics is difficult. By contrast, logical semantic representations capture deeper levels of sentence semantics, but their symbolic nature does not offer graded notions of textual similarity. We propose a method for determining semantic textual similarity by combining shallow features with features extracted from natural deduction proofs of bidirectional entailment relations between sentence pairs. For the natural deduction proofs, we use ccg2lambda, a higher-order automatic inference system, which converts Combinatory Categorial Grammar (CCG) derivation trees into semantic representations and conducts natural deduction proofs. Experiments show that our system was able to outperform other logic-based systems and that features derived from the proofs are effective for learning textual similarity.