Paul Reisert


2024

pdf bib
Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling
Irfan Robbani | Paul Reisert | Surawat Pothong | Naoya Inoue | Camélia Guerraoui | Wenzhi Wang | Shoichi Naito | Jungmin Choi | Kentaro Inui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy’s implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from LOGIC dataset and achieve a high agreement score (Krippendorf’s 𝛼 of 0.54) and reasonable coverage 83%. Finally, we conduct an experiment for detecting the structure of fallacies and discover that state-of-the-art language models struggle with detecting fallacy templates (0.47 accuracy). To facilitate research on fallacies, we make our dataset and guidelines publicly available.

pdf bib
Designing Logic Pattern Templates for Counter-Argument Logical Structure Analysis
Shoichi Naito | Wenzhi Wang | Paul Reisert | Naoya Inoue | Camélia Guerraoui | Kenshi Yamaguchi | Jungmin Choi | Irfan Robbani | Surawat Pothong | Kentaro Inui
Findings of the Association for Computational Linguistics: EMNLP 2024

2023

pdf bib
Teach Me How to Argue: A Survey on NLP Feedback Systems in Argumentation
Camelia Guerraoui | Paul Reisert | Naoya Inoue | Farjana Sultana Mim | Keshav Singh | Jungmin Choi | Irfan Robbani | Shoichi Naito | Wenzhi Wang | Kentaro Inui
Proceedings of the 10th Workshop on Argument Mining

The use of argumentation in education has shown improvement in students’ critical thinking skills, and computational models for argumentation have been developed to further assist this process. Although these models are useful for evaluating the quality of an argument, they often cannot explain why a particular argument score was predicted, i.e., why the argument is good or bad, which makes it difficult to provide constructive feedback to users, e.g., students, so that they can strengthen their critical thinking skills. In this survey, we explore current NLP feedback systems by categorizing each into four important dimensions of feedback (Richness, Visualization, Interactivity and Personalization). We discuss limitations for each dimension and provide suggestions to enhance the power of feedback and explanations to ultimately improve user critical thinking skills.

2019

pdf bib
An Empirical Study of Span Representations in Argumentation Structure Parsing
Tatsuki Kuribayashi | Hiroki Ouchi | Naoya Inoue | Paul Reisert | Toshinori Miyoshi | Jun Suzuki | Kentaro Inui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

For several natural language processing (NLP) tasks, span representation design is attracting considerable attention as a promising new technique; a common basis for an effective design has been established. With such basis, exploring task-dependent extensions for argumentation structure parsing (ASP) becomes an interesting research direction. This study investigates (i) span representation originally developed for other NLP tasks and (ii) a simple task-dependent extension for ASP. Our extensive experiments and analysis show that these representations yield high performance for ASP and provide some challenging types of instances to be parsed.

pdf bib
Unsupervised Learning of Discourse-Aware Text Representation for Essay Scoring
Farjana Sultana Mim | Naoya Inoue | Paul Reisert | Hiroki Ouchi | Kentaro Inui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Existing document embedding approaches mainly focus on capturing sequences of words in documents. However, some document classification and regression tasks such as essay scoring need to consider discourse structure of documents. Although some prior approaches consider this issue and utilize discourse structure of text for document classification, these approaches are dependent on computationally expensive parsers. In this paper, we propose an unsupervised approach to capture discourse structure in terms of coherence and cohesion for document embedding that does not require any expensive parser or annotation. Extrinsic evaluation results show that the document representation obtained from our approach improves the performance of essay Organization scoring and Argument Strength scoring.

pdf bib
When Choosing Plausible Alternatives, Clever Hans can be Clever
Pride Kavumba | Naoya Inoue | Benjamin Heinzerling | Keshav Singh | Paul Reisert | Kentaro Inui
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

Pretrained language models, such as BERT and RoBERTa, have shown large improvements in the commonsense reasoning benchmark COPA. However, recent work found that many improvements in benchmarks of natural language understanding are not due to models learning the task, but due to their increasing ability to exploit superficial cues, such as tokens that occur more often in the correct answer than the wrong one. Are BERT’s and RoBERTa’s good performance on COPA also caused by this? We find superficial cues in COPA, as well as evidence that BERT exploits these cues. To remedy this problem, we introduce Balanced COPA, an extension of COPA that does not suffer from easy-to-exploit single token cues. We analyze BERT’s and RoBERTa’s performance on original and Balanced COPA, finding that BERT relies on superficial cues when they are present, but still achieves comparable performance once they are made ineffective, suggesting that BERT learns the task to a certain degree when forced to. In contrast, RoBERTa does not appear to rely on superficial cues.

pdf bib
Improving Evidence Detection by Leveraging Warrants
Keshav Singh | Paul Reisert | Naoya Inoue | Pride Kavumba | Kentaro Inui
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Recognizing the implicit link between a claim and a piece of evidence (i.e. warrant) is the key to improving the performance of evidence detection. In this work, we explore the effectiveness of automatically extracted warrants for evidence detection. Given a claim and candidate evidence, our proposed method extracts multiple warrants via similarity search from an existing, structured corpus of arguments. We then attentively aggregate the extracted warrants, considering the consistency between the given argument and the acquired warrants. Although a qualitative analysis on the warrants shows that the extraction method needs to be improved, our results indicate that our method can still improve the performance of evidence detection.

pdf bib
Distantly Supervised Biomedical Knowledge Acquisition via Knowledge Graph Based Attention
Qin Dai | Naoya Inoue | Paul Reisert | Ryo Takahashi | Kentaro Inui
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

The increased demand for structured scientific knowledge has attracted considerable attention in extracting scientific relation from the ever growing scientific publications. Distant supervision is widely applied approach to automatically generate large amounts of labelled data with low manual annotation cost. However, distant supervision inevitably accompanies the wrong labelling problem, which will negatively affect the performance of Relation Extraction (RE). To address this issue, (Han et al., 2018) proposes a novel framework for jointly training both RE model and Knowledge Graph Completion (KGC) model to extract structured knowledge from non-scientific dataset. In this work, we firstly investigate the feasibility of this framework on scientific dataset, specifically on biomedical dataset. Secondly, to achieve better performance on the biomedical dataset, we extend the framework with other competitive KGC models. Moreover, we proposed a new end-to-end KGC model to extend the framework. Experimental results not only show the feasibility of the framework on the biomedical dataset, but also indicate the effectiveness of our extensions, because our extended model achieves significant and consistent improvements on distant supervised RE as compared with baselines.

pdf bib
Analytic Score Prediction and Justification Identification in Automated Short Answer Scoring
Tomoya Mizumoto | Hiroki Ouchi | Yoriko Isobe | Paul Reisert | Ryo Nagata | Satoshi Sekine | Kentaro Inui
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper provides an analytical assessment of student short answer responses with a view to potential benefits in pedagogical contexts. We first propose and formalize two novel analytical assessment tasks: analytic score prediction and justification identification, and then provide the first dataset created for analytic short answer scoring research. Subsequently, we present a neural baseline model and report our extensive empirical results to demonstrate how our dataset can be used to explore new and intriguing technical challenges in short answer scoring. The dataset is publicly available for research purposes.

2018

pdf bib
Improving Scientific Relation Classification with Task Specific Supersense
Qin Dai | Naoya Inoue | Paul Reisert | Kentaro Inui
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Feasible Annotation Scheme for Capturing Policy Argument Reasoning using Argument Templates
Paul Reisert | Naoya Inoue | Tatsuki Kuribayashi | Kentaro Inui
Proceedings of the 5th Workshop on Argument Mining

Most of the existing works on argument mining cast the problem of argumentative structure identification as classification tasks (e.g. attack-support relations, stance, explicit premise/claim). This paper goes a step further by addressing the task of automatically identifying reasoning patterns of arguments using predefined templates, which is called argument template (AT) instantiation. The contributions of this work are three-fold. First, we develop a simple, yet expressive set of easily annotatable ATs that can represent a majority of writer’s reasoning for texts with diverse policy topics while maintaining the computational feasibility of the task. Second, we create a small, but highly reliable annotated corpus of instantiated ATs on top of reliably annotated support and attack relations and conduct an annotation study. Third, we formulate the task of AT instantiation as structured prediction constrained by a feasible set of templates. Our evaluation demonstrates that we can annotate ATs with a reasonably high inter-annotator agreement, and the use of template-constrained inference is useful for instantiating ATs with only partial reasoning comprehension clues.

2015

pdf bib
A Computational Approach for Generating Toulmin Model Argumentation
Paul Reisert | Naoya Inoue | Naoaki Okazaki | Kentaro Inui
Proceedings of the 2nd Workshop on Argumentation Mining

pdf bib
Learning Sentence Ordering for Opinion Generation of Debate
Toshihiko Yanase | Toshinori Miyoshi | Kohsuke Yanai | Misa Sato | Makoto Iwayama | Yoshiki Niwa | Paul Reisert | Kentaro Inui
Proceedings of the 2nd Workshop on Argumentation Mining

2014

pdf bib
A Corpus Study for Identifying Evidence on Microblogs
Paul Reisert | Junta Mizuno | Miwa Kanno | Naoaki Okazaki | Kentaro Inui
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop