Naoya Inoue


pdf bib
LPAttack: A Feasible Annotation Scheme for Capturing Logic Pattern of Attacks in Arguments
Farjana Sultana Mim | Naoya Inoue | Shoichi Naito | Keshav Singh | Kentaro Inui
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In argumentative discourse, persuasion is often achieved by refuting or attacking others’ arguments. Attacking an argument is not always straightforward and often consists of complex rhetorical moves in which arguers may agree with a logic of an argument while attacking another logic. Furthermore, an arguer may neither deny nor agree with any logics of an argument, instead ignore them and attack the main stance of the argument by providing new logics and presupposing that the new logics have more value or importance than the logics presented in the attacked argument. However, there are no studies in computational argumentation that capture such complex rhetorical moves in attacks or the presuppositions or value judgments in them. To address this gap, we introduce LPAttack, a novel annotation scheme that captures the common modes and complex rhetorical moves in attacks along with the implicit presuppositions and value judgments. Our annotation study shows moderate inter-annotator agreement, indicating that human annotation for the proposed scheme is feasible. We publicly release our annotated corpus and the annotation guidelines.

pdf bib
IRAC: A Domain-Specific Annotated Corpus of Implicit Reasoning in Arguments
Keshav Singh | Naoya Inoue | Farjana Sultana Mim | Shoichi Naito | Kentaro Inui
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The task of implicit reasoning generation aims to help machines understand arguments by inferring plausible reasonings (usually implicit) between argumentative texts. While this task is easy for humans, machines still struggle to make such inferences and deduce the underlying reasoning. To solve this problem, we hypothesize that as human reasoning is guided by innate collection of domain-specific knowledge, it might be beneficial to create such a domain-specific corpus for machines. As a starting point, we create the first domain-specific resource of implicit reasonings annotated for a wide range of arguments, which can be leveraged to empower machines with better implicit reasoning generation ability. We carefully design an annotation framework to collect them on a large scale through crowdsourcing and show the feasibility of creating a such a corpus at a reasonable cost and high-quality. Our experiments indicate that models trained with domain-specific implicit reasonings significantly outperform domain-general models in both automatic and human evaluations. To facilitate further research towards implicit reasoning generation in arguments, we present an in-depth analysis of our corpus and crowdsourcing methodology, and release our materials (i.e., crowdsourcing guidelines and domain-specific resource of implicit reasonings).

pdf bib
TYPIC: A Corpus of Template-Based Diagnostic Comments on Argumentation
Shoichi Naito | Shintaro Sawada | Chihiro Nakagawa | Naoya Inoue | Kenshi Yamaguchi | Iori Shimizu | Farjana Sultana Mim | Keshav Singh | Kentaro Inui
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Providing feedback on the argumentation of the learner is essential for developing critical thinking skills, however, it requires a lot of time and effort. To mitigate the overload on teachers, we aim to automate a process of providing feedback, especially giving diagnostic comments which point out the weaknesses inherent in the argumentation. It is recommended to give specific diagnostic comments so that learners can recognize the diagnosis without misinterpretation. However, it is not obvious how the task of providing specific diagnostic comments should be formulated. We present a formulation of the task as template selection and slot filling to make an automatic evaluation easier and the behavior of the model more tractable. The key to the formulation is the possibility of creating a template set that is sufficient for practical use. In this paper, we define three criteria that a template set should satisfy: expressiveness, informativeness, and uniqueness, and verify the feasibility of creating a template set that satisfies these criteria as a first trial. We will show that it is feasible through an annotation study that converts diagnostic comments given in a text to a template format. The corpus used in the annotation study is publicly available.

pdf bib
Learning and Evaluating Character Representations in Novels
Naoya Inoue | Charuta Pethe | Allen Kim | Steven Skiena
Findings of the Association for Computational Linguistics: ACL 2022

We address the problem of learning fixed-length vector representations of characters in novels. Recent advances in word embeddings have proven successful in learning entity representations from short texts, but fall short on longer documents because they do not capture full book-level information. To overcome the weakness of such text-based embeddings, we propose two novel methods for representing characters: (i) graph neural network-based embeddings from a full corpus-based character network; and (ii) low-dimensional embeddings constructed from the occurrence pattern of characters in each novel. We test the quality of these character embeddings using a new benchmark suite to evaluate character representations, encompassing 12 different tasks. We show that our representation techniques combined with text-based embeddings lead to the best character representations, outperforming text-based embeddings in four tasks. Our dataset and evaluation script will be made publicly available to stimulate additional work in this area.


pdf bib
Two Training Strategies for Improving Relation Extraction over Universal Graph
Qin Dai | Naoya Inoue | Ryo Takahashi | Kentaro Inui
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper explores how the Distantly Supervised Relation Extraction (DS-RE) can benefit from the use of a Universal Graph (UG), the combination of a Knowledge Graph (KG) and a large-scale text collection. A straightforward extension of a current state-of-the-art neural model for DS-RE with a UG may lead to degradation in performance. We first report that this degradation is associated with the difficulty in learning a UG and then propose two training strategies: (1) Path Type Adaptive Pretraining, which sequentially trains the model with different types of UG paths so as to prevent the reliance on a single type of UG path; and (2) Complexity Ranking Guided Attention mechanism, which restricts the attention span according to the complexity of a UG path so as to force the model to extract features not only from simple UG paths but also from complex ones. Experimental results on both biomedical and NYT10 datasets prove the robustness of our methods and achieve a new state-of-the-art result on the NYT10 dataset. The code and datasets used in this paper are available at

pdf bib
Exploring Methodologies for Collecting High-Quality Implicit Reasoning in Arguments
Keshav Singh | Farjana Sultana Mim | Naoya Inoue | Shoichi Naito | Kentaro Inui
Proceedings of the 8th Workshop on Argument Mining

Annotation of implicit reasoning (i.e., warrant) in arguments is a critical resource to train models in gaining deeper understanding and correct interpretation of arguments. However, warrants are usually annotated in unstructured form, having no restriction on their lexical structure which sometimes makes it difficult to interpret how warrants relate to any of the information given in claim and premise. Moreover, assessing and determining better warrants from the large variety of reasoning patterns of unstructured warrants becomes a formidable task. Therefore, in order to annotate warrants in a more interpretative and restrictive way, we propose two methodologies to annotate warrants in a semi-structured form. To the best of our knowledge, we are the first to show how such semi-structured warrants can be annotated on a large scale via crowdsourcing. We demonstrate through extensive quality evaluation that our methodologies enable collecting better quality warrants in comparison to unstructured annotations. To further facilitate research towards the task of explicating warrants in arguments, we release our materials publicly (i.e., crowdsourcing guidelines and collected warrants).

pdf bib
Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension
Naoya Inoue | Harsh Trivedi | Steven Sinha | Niranjan Balasubramanian | Kentaro Inui
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

How can we generate concise explanations for multi-hop Reading Comprehension (RC)? The current strategies of identifying supporting sentences can be seen as an extractive question-focused summarization of the input text. However, these extractive explanations are not necessarily concise i.e. not minimally sufficient for answering a question. Instead, we advocate for an abstractive approach, where we propose to generate a question-focused, abstractive summary of input paragraphs and then feed it to an RC system. Given a limited amount of human-annotated abstractive explanations, we train the abstractive explainer in a semi-supervised manner, where we start from the supervised model and then train it further through trial and error maximizing a conciseness-promoted reward function. Our experiments demonstrate that the proposed abstractive explainer can generate more compact explanations than an extractive explainer with limited supervision (only 2k instances) while maintaining sufficiency.

pdf bib
Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts
Allen Kim | Charuta Pethe | Naoya Inoue | Steve Skiena
Findings of the Association for Computational Linguistics: EMNLP 2021

Substantial amounts of work are required to clean large collections of digitized books for NLP analysis, both because of the presence of errors in the scanned text and the presence of duplicate volumes in the corpora. In this paper, we consider the issue of deduplication in the presence of optical character recognition (OCR) errors. We present methods to handle these errors, evaluated on a collection of 19,347 texts from the Project Gutenberg dataset and 96,635 texts from the HathiTrust Library. We demonstrate that improvements in language models now enable the detection and correction of OCR errors without consideration of the scanning image itself. The inconsistencies found by aligning pairs of scans of the same underlying work provides training data to build models for detecting and correcting errors. We identify the canonical version for each of 17,136 repeatedly-scanned books from 58,808 scans. Finally, we investigate methods to detect and correct errors in single-copy texts. We show that on average, our method corrects over six times as many errors as it introduces. We also provide interesting analysis on the relation between scanning quality and other factors such as location and publication year.


pdf bib
Modeling Event Salience in Narratives via Barthes’ Cardinal Functions
Takaki Otake | Sho Yokoi | Naoya Inoue | Ryo Takahashi | Tatsuki Kuribayashi | Kentaro Inui
Proceedings of the 28th International Conference on Computational Linguistics

Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes’ definition of event salience and propose several unsupervised methods that require only a pre-trained language model. Evaluating the proposed methods on folktales with event salience annotation, we show that the proposed methods outperform baseline methods and find fine-tuning a language model on narrative texts is a key factor in improving the proposed methods.

pdf bib
R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason
Naoya Inoue | Pontus Stenetorp | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent studies have revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This prevents the community from reliably measuring the progress of RC systems. To address this issue, we introduce R4C, a new task for evaluating RC systems’ internal reasoning. R4C requires giving not only answers but also derivations: explanations that justify predicted answers. We present a reliable, crowdsourced framework for scalably annotating RC datasets with derivations. We create and publicly release the R4C dataset, the first, quality-assured dataset consisting of 4.6k questions, each of which is annotated with 3 reference derivations (i.e. 13.8k derivations). Experiments show that our automatic evaluation metrics using multiple reference derivations are reliable, and that R4C assesses different skills from an existing benchmark.


pdf bib
An Empirical Study of Span Representations in Argumentation Structure Parsing
Tatsuki Kuribayashi | Hiroki Ouchi | Naoya Inoue | Paul Reisert | Toshinori Miyoshi | Jun Suzuki | Kentaro Inui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

For several natural language processing (NLP) tasks, span representation design is attracting considerable attention as a promising new technique; a common basis for an effective design has been established. With such basis, exploring task-dependent extensions for argumentation structure parsing (ASP) becomes an interesting research direction. This study investigates (i) span representation originally developed for other NLP tasks and (ii) a simple task-dependent extension for ASP. Our extensive experiments and analysis show that these representations yield high performance for ASP and provide some challenging types of instances to be parsed.

pdf bib
Unsupervised Learning of Discourse-Aware Text Representation for Essay Scoring
Farjana Sultana Mim | Naoya Inoue | Paul Reisert | Hiroki Ouchi | Kentaro Inui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Existing document embedding approaches mainly focus on capturing sequences of words in documents. However, some document classification and regression tasks such as essay scoring need to consider discourse structure of documents. Although some prior approaches consider this issue and utilize discourse structure of text for document classification, these approaches are dependent on computationally expensive parsers. In this paper, we propose an unsupervised approach to capture discourse structure in terms of coherence and cohesion for document embedding that does not require any expensive parser or annotation. Extrinsic evaluation results show that the document representation obtained from our approach improves the performance of essay Organization scoring and Argument Strength scoring.

pdf bib
When Choosing Plausible Alternatives, Clever Hans can be Clever
Pride Kavumba | Naoya Inoue | Benjamin Heinzerling | Keshav Singh | Paul Reisert | Kentaro Inui
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

Pretrained language models, such as BERT and RoBERTa, have shown large improvements in the commonsense reasoning benchmark COPA. However, recent work found that many improvements in benchmarks of natural language understanding are not due to models learning the task, but due to their increasing ability to exploit superficial cues, such as tokens that occur more often in the correct answer than the wrong one. Are BERT’s and RoBERTa’s good performance on COPA also caused by this? We find superficial cues in COPA, as well as evidence that BERT exploits these cues.To remedy this problem, we introduce Balanced COPA, an extension of COPA that does not suffer from easy-to-exploit single token cues. We analyze BERT’s and RoBERTa’s performance on original and Balanced COPA, finding that BERT relies on superficial cues when they are present, but still achieves comparable performance once they are made ineffective, suggesting that BERT learns the task to a certain degree when forced to. In contrast, RoBERTa does not appear to rely on superficial cues.

pdf bib
Inject Rubrics into Short Answer Grading System
Tianqi Wang | Naoya Inoue | Hiroki Ouchi | Tomoya Mizumoto | Kentaro Inui
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Short Answer Grading (SAG) is a task of scoring students’ answers in examinations. Most existing SAG systems predict scores based only on the answers, including the model used as base line in this paper, which gives the-state-of-the-art performance. But they ignore important evaluation criteria such as rubrics, which play a crucial role for evaluating answers in real-world situations. In this paper, we present a method to inject information from rubrics into SAG systems. We implement our approach on top of word-level attention mechanism to introduce the rubric information, in order to locate information in each answer that are highly related to the score. Our experimental results demonstrate that injecting rubric information effectively contributes to the performance improvement and that our proposed model outperforms the state-of-the-art SAG model on the widely used ASAP-SAS dataset under low-resource settings.

pdf bib
Improving Evidence Detection by Leveraging Warrants
Keshav Singh | Paul Reisert | Naoya Inoue | Pride Kavumba | Kentaro Inui
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Recognizing the implicit link between a claim and a piece of evidence (i.e. warrant) is the key to improving the performance of evidence detection. In this work, we explore the effectiveness of automatically extracted warrants for evidence detection. Given a claim and candidate evidence, our proposed method extracts multiple warrants via similarity search from an existing, structured corpus of arguments. We then attentively aggregate the extracted warrants, considering the consistency between the given argument and the acquired warrants. Although a qualitative analysis on the warrants shows that the extraction method needs to be improved, our results indicate that our method can still improve the performance of evidence detection.

pdf bib
Distantly Supervised Biomedical Knowledge Acquisition via Knowledge Graph Based Attention
Qin Dai | Naoya Inoue | Paul Reisert | Ryo Takahashi | Kentaro Inui
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

The increased demand for structured scientific knowledge has attracted considerable attention in extracting scientific relation from the ever growing scientific publications. Distant supervision is widely applied approach to automatically generate large amounts of labelled data with low manual annotation cost. However, distant supervision inevitably accompanies the wrong labelling problem, which will negatively affect the performance of Relation Extraction (RE). To address this issue, (Han et al., 2018) proposes a novel framework for jointly training both RE model and Knowledge Graph Completion (KGC) model to extract structured knowledge from non-scientific dataset. In this work, we firstly investigate the feasibility of this framework on scientific dataset, specifically on biomedical dataset. Secondly, to achieve better performance on the biomedical dataset, we extend the framework with other competitive KGC models. Moreover, we proposed a new end-to-end KGC model to extend the framework. Experimental results not only show the feasibility of the framework on the biomedical dataset, but also indicate the effectiveness of our extensions, because our extended model achieves significant and consistent improvements on distant supervised RE as compared with baselines.

pdf bib
Annotating with Pros and Cons of Technologies in Computer Science Papers
Hono Shirai | Naoya Inoue | Jun Suzuki | Kentaro Inui
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

This paper explores a task for extracting a technological expression and its pros/cons from computer science papers. We report ongoing efforts on an annotated corpus of pros/cons and an analysis of the nature of the automatic extraction task. Specifically, we show how to adapt the targeted sentiment analysis task for pros/cons extraction in computer science papers and conduct an annotation study. In order to identify the challenges of the automatic extraction task, we construct a strong baseline model and conduct an error analysis. The experiments show that pros/cons can be consistently annotated by several annotators, and that the task is challenging due to domain-specific knowledge. The annotated dataset is made publicly available for research purposes.


pdf bib
Improving Scientific Relation Classification with Task Specific Supersense
Qin Dai | Naoya Inoue | Paul Reisert | Kentaro Inui
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Feasible Annotation Scheme for Capturing Policy Argument Reasoning using Argument Templates
Paul Reisert | Naoya Inoue | Tatsuki Kuribayashi | Kentaro Inui
Proceedings of the 5th Workshop on Argument Mining

Most of the existing works on argument mining cast the problem of argumentative structure identification as classification tasks (e.g. attack-support relations, stance, explicit premise/claim). This paper goes a step further by addressing the task of automatically identifying reasoning patterns of arguments using predefined templates, which is called argument template (AT) instantiation. The contributions of this work are three-fold. First, we develop a simple, yet expressive set of easily annotatable ATs that can represent a majority of writer’s reasoning for texts with diverse policy topics while maintaining the computational feasibility of the task. Second, we create a small, but highly reliable annotated corpus of instantiated ATs on top of reliably annotated support and attack relations and conduct an annotation study. Third, we formulate the task of AT instantiation as structured prediction constrained by a feasible set of templates. Our evaluation demonstrates that we can annotate ATs with a reasonably high inter-annotator agreement, and the use of template-constrained inference is useful for instantiating ATs with only partial reasoning comprehension clues.


pdf bib
Generating Stylistically Consistent Dialog Responses with Transfer Learning
Reina Akama | Kazuaki Inada | Naoya Inoue | Sosuke Kobayashi | Kentaro Inui
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We propose a novel, data-driven, and stylistically consistent dialog response generation system. To create a user-friendly system, it is crucial to make generated responses not only appropriate but also stylistically consistent. For leaning both the properties effectively, our proposed framework has two training stages inspired by transfer learning. First, we train the model to generate appropriate responses, and then we ensure that the responses have a specific style. Experimental results demonstrate that the proposed method produces stylistically consistent responses while maintaining the appropriateness of the responses learned in a general domain.

pdf bib
An RNN-based Binary Classifier for the Story Cloze Test
Melissa Roemmele | Sosuke Kobayashi | Naoya Inoue | Andrew Gordon
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

The Story Cloze Test consists of choosing a sentence that best completes a story given two choices. In this paper we present a system that performs this task using a supervised binary classifier on top of a recurrent neural network to predict the probability that a given story ending is correct. The classifier is trained to distinguish correct story endings given in the training data from incorrect ones that we artificially generate. Our experiments evaluate different methods for generating these negative examples, as well as different embedding-based representations of the stories. Our best result obtains 67.2% accuracy on the test set, outperforming the existing top baseline of 58.5%.

pdf bib
Handling Multiword Expressions in Causality Estimation
Shota Sasaki | Sho Takase | Naoya Inoue | Naoaki Okazaki | Kentaro Inui
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers


pdf bib
Modeling Context-sensitive Selectional Preference with Distributed Representations
Naoya Inoue | Yuichiroh Matsubayashi | Masayuki Ono | Naoaki Okazaki | Kentaro Inui
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper proposes a novel problem setting of selectional preference (SP) between a predicate and its arguments, called as context-sensitive SP (CSP). CSP models the narrative consistency between the predicate and preceding contexts of its arguments, in addition to the conventional SP based on semantic types. Furthermore, we present a novel CSP model that extends the neural SP model (Van de Cruys, 2014) to incorporate contextual information into the distributed representations of arguments. Experimental results demonstrate that the proposed CSP model successfully learns CSP and outperforms the conventional SP model in coreference cluster ranking.


pdf bib
A Computational Approach for Generating Toulmin Model Argumentation
Paul Reisert | Naoya Inoue | Naoaki Okazaki | Kentaro Inui
Proceedings of the 2nd Workshop on Argumentation Mining


pdf bib
An Example-Based Approach to Difficult Pronoun Resolution
Canasai Kruengkrai | Naoya Inoue | Jun Sugiura | Kentaro Inui
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing


pdf bib
Coreference Resolution with ILP-based Weighted Abduction
Naoya Inoue | Ekaterina Ovchinnikova | Kentaro Inui | Jerry Hobbs
Proceedings of COLING 2012