Yohan Jo


2024

pdf bib
Model-based Preference Optimization in Abstractive Summarization without Human Feedback
Jaepill Choi | Kyubyung Chae | Jiwoo Song | Yohan Jo | Taesup Kim
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In abstractive summarization, the challenge of producing concise and accurate summaries arises from the vast amount of information contained in the source document. Consequently, although Large Language Models (LLMs) can generate fluent text, they often introduce inaccuracies by hallucinating content not found in the original source. While supervised fine-tuning methods that maximize likelihood contribute to this issue, they do not consistently enhance the faithfulness of the summaries. Preference-based optimization methods, such as Direct Preference Optimization (DPO), can further refine the model to align with human preferences. However, these methods still heavily depend on costly human feedback. In this work, we introduce a novel and straightforward approach called Model-based Preference Optimization (MPO) to fine-tune LLMs for improved summarization abilities without any human feedback. By leveraging the model’s inherent summarization capabilities, we create a preference dataset that is fully generated by the model using different decoding strategies. Our experiments on standard summarization datasets and various metrics demonstrate that our proposed MPO significantly enhances the quality of generated summaries without relying on human feedback. The code is publicly available at https://github.com/cjaep/MPO.

pdf bib
Mitigating Hallucination in Abstractive Summarization with Domain-Conditional Mutual Information
Kyubyung Chae | Jaepill Choi | Yohan Jo | Taesup Kim
Findings of the Association for Computational Linguistics: NAACL 2024

A primary challenge in abstractive summarization is hallucination—the phenomenon where a model generates plausible text that is absent in the source text. We hypothesize that the domain (or topic) of the source text triggers the model to generate text that is highly probable in the domain, neglecting the details of the source text. To alleviate this model bias, we introduce a decoding strategy based on domain-conditional pointwise mutual information. This strategy adjusts the generation probability of each token by comparing it with the token’s marginal probability within the domain of the source text. According to evaluation on the XSUM dataset, our method demonstrates improvement in terms of faithfulness and source relevance.

pdf bib
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
Sunjun Kweon | Junu Kim | Jiyoun Kim | Sujeong Im | Eunbyeol Cho | Seongsu Bae | Jungwoo Oh | Gyubok Lee | Jong Hak Moon | Seng Chan You | Seungjin Baek | Chang Hoon Han | Yoon Bin Jung | Yohan Jo | Edward Choi
Findings of the Association for Computational Linguistics: ACL 2024

The development of large language models tailored for handling patients’ clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations.To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature.We then use these synthetic notes to train our specialized clinical large language model, Asclepius.While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes.We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources—including weights, codes, and data—used in the development of Asclepius will be made publicly accessible for future research.

2023

pdf bib
FactKG: Fact Verification via Reasoning on Knowledge Graphs
Jiho Kim | Sungjin Park | Yeonsu Kwon | Yohan Jo | James Thorne | Edward Choi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In real world applications, knowledge graphs (KG) are widely used in various domains (e.g. medical applications and dialogue agents). However, for fact verification, KGs have not been adequately utilized as a knowledge source. KGs can be a valuable knowledge source in fact verification due to their reliability and broad applicability. A KG consists of nodes and edges which makes it clear how concepts are linked together, allowing machines to reason over chains of topics. However, there are many challenges in understanding how these machine-readable concepts map to information in text. To enable the community to better use KGs, we introduce a new dataset, FactKG: Fact Verificationvia Reasoning on Knowledge Graphs. It consists of 108k natural language claims with five types of reasoning: One-hop, Conjunction, Existence, Multi-hop, and Negation. Furthermore, FactKG contains various linguistic patterns, including colloquial style claims as well as written style claims to increase practicality. Lastly, we develop a baseline approach and analyze FactKG over these reasoning types. We believe FactKG can advance both reliability and practicality in KG-based fact verification.

pdf bib
Open-WikiTable : Dataset for Open Domain Question Answering with Complex Reasoning over Table
Sunjun Kweon | Yeonsu Kwon | Seonhee Cho | Yohan Jo | Edward Choi
Findings of the Association for Computational Linguistics: ACL 2023

Despite recent interest in open domain question answering (ODQA) over tables, many studies still rely on datasets that are not truly optimal for the task with respect to utilizing structural nature of table. These datasets assume answers reside as a single cell value and do not necessitate exploring over multiple cells such as aggregation, comparison, and sorting. Thus, we release Open-WikiTable, the first ODQA dataset that requires complex reasoning over tables. Open-WikiTable is built upon WikiSQL and WikiTableQuestions to be applicable in the open-domain setting. As each question is coupled with both textual answers and SQL queries, Open-WikiTable opens up a wide range of possibilities for future research, as both reader and parser methods can be applied. The dataset is publicly available.

pdf bib
Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users
Yohan Jo | Xinyan Zhao | Arijit Biswas | Nikoletta Basiou | Vincent Auvray | Nikolaos Malandrakis | Angeliki Metallinou | Alexandros Potamianos
Findings of the Association for Computational Linguistics: EMNLP 2023

While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, we release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. To collect this dataset, each user utterance from MultiWOZ 2.2 was replaced with a small chat between two users that is semantically and pragmatically consistent with the original user utterance, thus resulting in the same dialogue state and system response. These dialogues reflect interesting dynamics of collaborative decision-making in task-oriented scenarios, e.g., social chatter and deliberation. Supported by this data, we propose the novel task of multi-user contextual query rewriting: to rewrite a task-oriented chat between two users as a concise task-oriented query that retains only task-relevant information and that is directly consumable by the dialogue system. We demonstrate that in multi-user dialogues, using predicted rewrites substantially improves dialogue state tracking without modifying existing dialogue systems that are trained for single-user dialogues. Further, this method surpasses training a medium-sized model directly on multi-user dialogues and generalizes to unseen domains.

pdf bib
KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models
Jiho Kim | Yeonsu Kwon | Yohan Jo | Edward Choi
Findings of the Association for Computational Linguistics: EMNLP 2023

While large language models (LLMs) have made considerable advancements in understanding and generating unstructured text, their application in structured data remains underexplored. Particularly, using LLMs for complex reasoning tasks on knowledge graphs (KGs) remains largely untouched. To address this, we propose KG-GPT, a multi-purpose framework leveraging LLMs for tasks employing KGs. KG-GPT comprises three steps: Sentence Segmentation, Graph Retrieval, and Inference, each aimed at partitioning sentences, retrieving relevant graph components, and deriving logical conclusions, respectively. We evaluate KG-GPT using KG-based fact verification and KGQA benchmarks, with the model showing competitive and robust performance, even outperforming several fully-supervised models. Our work, therefore, marks a significant step in unifying structured and unstructured data processing within the realm of LLMs.

pdf bib
From Values to Opinions: Predicting Human Behaviors and Stances Using Value-Injected Large Language Models
Dongjun Kang | Joonsuk Park | Yohan Jo | JinYeong Bak
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Being able to predict people’s opinions on issues and behaviors in realistic scenarios can be helpful in various domains, such as politics and marketing. However, conducting large-scale surveys like the European Social Survey to solicit people’s opinions on individual issues can incur prohibitive costs. Leveraging prior research showing influence of core human values on individual decisions and actions, we propose to use value-injected large language models (LLM) to predict opinions and behaviors. To this end, we present Value Injection Method (VIM), a collection of two methods—argument generation and question answering—designed to inject targeted value distributions into LLMs via fine-tuning. We then conduct a series of experiments on four tasks to test the effectiveness of VIM and the possibility of using value-injected LLMs to predict opinions and behaviors of people. We find that LLMs value-injected with variations of VIM substantially outperform the baselines. Also, the results suggest that opinions and behaviors can be better predicted using value-injected LLMs than the baseline approaches.

pdf bib
A Zero-Shot Approach for Multi-User Task-Oriented Dialog Generation
Shiv Surya | Yohan Jo | Arijit Biswas | Alexandros Potamianos
Proceedings of the 16th International Natural Language Generation Conference

Prior art investigating task-oriented dialog and automatic generation of such dialogs have focused on single-user dialogs between a single user and an agent. However, there is limited study on adapting such AI agents to multi-user conversations (involving multiple users and an agent). Multi-user conversations are richer than single-user conversations containing social banter and collaborative decision making. The most significant challenge impeding such studies is the lack of suitable multi-user task-oriented dialogs with annotations of user belief states and system actions. One potential solution is multi-user dialog generation from single-user data. Many single-user dialogs datasets already contain dialog state information (intents, slots), thus making them suitable candidates. In this work, we propose a novel approach for expanding single-user task-oriented dialogs (e.g. MultiWOZ) to multi-user dialogs in a zero-shot setting.

2022

pdf bib
Argument Mining for Review Helpfulness Prediction
Zaiqian Chen | Daniel Verdi do Amarante | Jenna Donaldson | Yohan Jo | Joonsuk Park
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The importance of reliably determining the helpfulness of product reviews is rising as both helpful and unhelpful reviews continue to accumulate on e-commerce websites. And argumentational features—such as the structure of arguments and the types of underlying elementary units—have shown to be promising indicators of product review helpfulness. However, their adoption has been limited due to the lack of sufficient resources and large-scale experiments investigating their utility. To this end, we present the AMazon Argument Mining (AM2) corpus—a corpus of 878 Amazon reviews on headphones annotated according to a theoretical argumentation model designed to evaluate argument quality.Experiments show that employing argumentational features leads to statistically significant improvements over the state-of-the-art review helpfulness predictors under both text-only and text-and-image settings.

pdf bib
Status Biases in Deliberation Online: Evidence from a Randomized Experiment on ChangeMyView
Emaad Manzoor | Yohan Jo | Alan Montgomery
Findings of the Association for Computational Linguistics: EMNLP 2022

Status is widely used to incentivize user engagement online. However, visible status indicators could inadvertently bias online deliberation to favor high-status users. In this work, we design and deploy a randomized experiment on the ChangeMyView platform to quantify status biases in deliberation online. We find strong evidence of status bias: hiding status on ChangeMyView increases the persuasion rate of moderate-status users by 84% and decreases the persuasion rate of high-status users by 41% relative to the control group. We also find that the persuasive power of status is moderated by verbosity, suggesting that status is used as an information-processing heuristic under cognitive load. Finally, we find that a user’s status influences the argumentation behavior of other users they interact with in a manner that disadvantages low and moderate-status users.

pdf bib
Proceedings of the 9th Workshop on Argument Mining
Gabriella Lapesa | Jodi Schneider | Yohan Jo | Sougata Saha
Proceedings of the 9th Workshop on Argument Mining

2021

pdf bib
Classifying Argumentative Relations Using Logical Mechanisms and Argumentation Schemes
Yohan Jo | Seojin Bang | Chris Reed | Eduard Hovy
Transactions of the Association for Computational Linguistics, Volume 9

While argument mining has achieved significant success in classifying argumentative relations between statements (support, attack, and neutral), we have a limited computational understanding of logical mechanisms that constitute those relations. Most recent studies rely on black-box models, which are not as linguistically insightful as desired. On the other hand, earlier studies use rather simple lexical features, missing logical relations between statements. To overcome these limitations, our work classifies argumentative relations based on four logical and theory-informed mechanisms between two statements, namely, (i) factual consistency, (ii) sentiment coherence, (iii) causal relation, and (iv) normative relation. We demonstrate that our operationalization of these logical mechanisms classifies argumentative relations without directly training on data labeled with the relations, significantly better than several unsupervised baselines. We further demonstrate that these mechanisms also improve supervised classifiers through representation learning.

pdf bib
Knowledge-Enhanced Evidence Retrieval for Counterargument Generation
Yohan Jo | Haneul Yoo | JinYeong Bak | Alice Oh | Chris Reed | Eduard Hovy
Findings of the Association for Computational Linguistics: EMNLP 2021

Finding counterevidence to statements is key to many tasks, including counterargument generation. We build a system that, given a statement, retrieves counterevidence from diverse sources on the Web. At the core of this system is a natural language inference (NLI) model that determines whether a candidate sentence is valid counterevidence or not. Most NLI models to date, however, lack proper reasoning abilities necessary to find counterevidence that involves complex inference. Thus, we present a knowledge-enhanced NLI model that aims to handle causality- and example-based inference by incorporating knowledge graphs. Our NLI model outperforms baselines for NLI tasks, especially for instances that require the targeted inference. In addition, this NLI model further improves the counterevidence retrieval system, notably finding complex counterevidence better.

2020

pdf bib
Machine-Aided Annotation for Fine-Grained Proposition Types in Argumentation
Yohan Jo | Elijah Mayfield | Chris Reed | Eduard Hovy
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce a corpus of the 2016 U.S. presidential debates and commentary, containing 4,648 argumentative propositions annotated with fine-grained proposition types. Modern machine learning pipelines for analyzing argument have difficulty distinguishing between types of propositions based on their factuality, rhetorical positioning, and speaker commitment. Inability to properly account for these facets leaves such systems inaccurate in understanding of fine-grained proposition types. In this paper, we demonstrate an approach to annotating for four complex proposition types, namely normative claims, desires, future possibility, and reported speech. We develop a hybrid machine learning and human workflow for annotation that allows for efficient and reliable annotation of complex linguistic phenomena, and demonstrate with preliminary analysis of rhetorical strategies and structure in presidential debates. This new dataset and method can support technical researchers seeking more nuanced representations of argument, as well as argumentation theorists developing new quantitative analyses.

pdf bib
Detecting Attackable Sentences in Arguments
Yohan Jo | Seojin Bang | Emaad Manzoor | Eduard Hovy | Chris Reed
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Finding attackable sentences in an argument is the first step toward successful refutation in argumentation. We present a first large-scale analysis of sentence attackability in online arguments. We analyze driving reasons for attacks in argumentation and identify relevant characteristics of sentences. We demonstrate that a sentence’s attackability is associated with many of these characteristics regarding the sentence’s content, proposition types, and tone, and that an external knowledge source can provide useful information about attackability. Building on these findings, we demonstrate that machine learning models can automatically detect attackable sentences in arguments, significantly better than several baselines and comparably well to laypeople.

pdf bib
Extracting Implicitly Asserted Propositions in Argumentation
Yohan Jo | Jacky Visser | Chris Reed | Eduard Hovy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Argumentation accommodates various rhetorical devices, such as questions, reported speech, and imperatives. These rhetorical tools usually assert argumentatively relevant propositions rather implicitly, so understanding their true meaning is key to understanding certain arguments properly. However, most argument mining systems and computational linguistics research have paid little attention to implicitly asserted propositions in argumentation. In this paper, we examine a wide range of computational methods for extracting propositions that are implicitly asserted in questions, reported speech, and imperatives in argumentation. By evaluating the models on a corpus of 2016 U.S. presidential debates and online commentary, we demonstrate the effectiveness and limitations of the computational models. Our study may inform future research on argument mining and the semantics of these rhetorical devices in argumentation.

2019

pdf bib
Using Functional Schemas to Understand Social Media Narratives
Xinru Yan | Aakanksha Naik | Yohan Jo | Carolyn Rose
Proceedings of the Second Workshop on Storytelling

We propose a novel take on understanding narratives in social media, focusing on learning ”functional story schemas”, which consist of sets of stereotypical functional structures. We develop an unsupervised pipeline to extract schemas and apply our method to Reddit posts to detect schematic structures that are characteristic of different subreddits. We validate our schemas through human interpretation and evaluate their utility via a text classification task. Our experiments show that extracted schemas capture distinctive structural patterns in different subreddits, improving classification performance of several models by 2.4% on average. We also observe that these schemas serve as lenses that reveal community norms.

pdf bib
A Cascade Model for Proposition Extraction in Argumentation
Yohan Jo | Jacky Visser | Chris Reed | Eduard Hovy
Proceedings of the 6th Workshop on Argument Mining

We present a model to tackle a fundamental but understudied problem in computational argumentation: proposition extraction. Propositions are the basic units of an argument and the primary building blocks of most argument mining systems. However, they are usually substituted by argumentative discourse units obtained via surface-level text segmentation, which may yield text segments that lack semantic information necessary for subsequent argument mining processes. In contrast, our cascade model aims to extract complete propositions by handling anaphora resolution, text segmentation, reported speech, questions, imperatives, missing subjects, and revision. We formulate each task as a computational problem and test various models using a corpus of the 2016 U.S. presidential debates. We show promising performance for some tasks and discuss main challenges in proposition extraction.

2018

pdf bib
Attentive Interaction Model: Modeling Changes in View in Argumentation
Yohan Jo | Shivani Poddar | Byungsoo Jeon | Qinlan Shen | Carolyn Rosé | Graham Neubig
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present a neural architecture for modeling argumentative dialogue that explicitly models the interplay between an Opinion Holder’s (OH’s) reasoning and a challenger’s argument, with the goal of predicting if the argument successfully changes the OH’s view. The model has two components: (1) vulnerable region detection, an attention model that identifies parts of the OH’s reasoning that are amenable to change, and (2) interaction encoding, which identifies the relationship between the content of the OH’s reasoning and that of the challenger’s argument. Based on evaluation on discussions from the Change My View forum on Reddit, the two components work together to predict an OH’s change in view, outperforming several baselines. A posthoc analysis suggests that sentences picked out by the attention model are addressed more frequently by successful arguments than by unsuccessful ones.

2017

pdf bib
Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior
Keith Maki | Michael Yoder | Yohan Jo | Carolyn Rosé
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this work we investigate how role-based behavior profiles of a Wikipedia editor, considered against the backdrop of roles taken up by other editors in discussions, predict the success of the editor at achieving an impact on the associated article. We first contribute a new public dataset including a task predicting the success of Wikipedia editors involved in discussion, measured by an operationalization of the lasting impact of their edits in the article. We then propose a probabilistic graphical model that advances earlier work inducing latent discussion roles using the light supervision of success in the negotiation task. We evaluate the performance of the model and interpret findings of roles and group configurations that lead to certain outcomes on Wikipedia.

pdf bib
Modeling Dialogue Acts with Content Word Filtering and Speaker Preferences
Yohan Jo | Michael Yoder | Hyeju Jang | Carolyn Rosé
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present an unsupervised model of dialogue act sequences in conversation. By modeling topical themes as transitioning more slowly than dialogue acts in conversation, our model de-emphasizes content-related words in order to focus on conversational function words that signal dialogue acts. We also incorporate speaker tendencies to use some acts more than others as an additional predictor of dialogue act prevalence beyond temporal dependencies. According to the evaluation presented on two dissimilar corpora, the CNET forum and NPS Chat corpus, the effectiveness of each modeling assumption is found to vary depending on characteristics of the data. De-emphasizing content-related words yields improvement on the CNET corpus, while utilizing speaker tendencies is advantageous on the NPS corpus. The components of our model complement one another to achieve robust performance on both corpora and outperform state-of-the-art baseline models.

2016

pdf bib
Metaphor Detection with Topic Transition, Emotion and Cognition in Context
Hyeju Jang | Yohan Jo | Qinlan Shen | Michael Miller | Seungwhan Moon | Carolyn Rosé
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Metaphor Detection in Discourse
Hyeju Jang | Seungwhan Moon | Yohan Jo | Carolyn Rosé
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue