Meeyoung Cha


2023

pdf bib
Unified Neural Topic Model via Contrastive Learning and Term Weighting
Sungwon Han | Mingi Shin | Sungkyu Park | Changwook Jung | Meeyoung Cha
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Two types of topic modeling predominate: generative methods that employ probabilistic latent models and clustering methods that identify semantically coherent groups. This paper newly presents UTopic (Unified neural Topic model via contrastive learning and term weighting) that combines the advantages of these two types. UTopic uses contrastive learning and term weighting to learn knowledge from a pretrained language model and discover influential terms from semantically coherent clusters. Experiments show that the generated topics have a high-quality topic-word distribution in terms of topic coherence, outperforming existing baselines across multiple topic coherence measures. We demonstrate how our model can be used as an add-on to existing topic models and improve their performance.

pdf bib
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created through Human-Machine Collaboration
Hwaran Lee | Seokhee Hong | Joonsuk Park | Takyoung Kim | Meeyoung Cha | Yejin Choi | Byoungpil Kim | Gunhee Kim | Eun-Ju Lee | Yong Lim | Alice Oh | Sangchul Park | Jung-Woo Ha
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.

pdf bib
Detecting Contextomized Quotes in News Headlines by Contrastive Learning
Seonyeong Song | Hyeonho Song | Kunwoo Park | Jiyoung Han | Meeyoung Cha
Findings of the Association for Computational Linguistics: EACL 2023

Quotes are critical for establishing credibility in news articles. A direct quote enclosed in quotation marks has a strong visual appeal and is a sign of a reliable citation. Unfortunately, this journalistic practice is not strictly followed, and a quote in the headline is often “contextomized.” Such a quote uses words out of context in a way that alters the speaker’s intention so that there is no semantically matching quote in the body text. We present QuoteCSE, a contrastive learning framework that represents the embedding of news quotes based on domain-driven positive and negative samples to identify such an editorial strategy. The dataset and code are available at https://github.com/ssu-humane/contextomized-quote-contrastive.

2020

pdf bib
A Risk Communication Event Detection Model via Contrastive Learning
Mingi Shin | Sungwon Han | Sungkyu Park | Meeyoung Cha
Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper presents a time-topic cohesive model describing the communication patterns on the coronavirus pandemic from three Asian countries. The strength of our model is two-fold. First, it detects contextualized events based on topical and temporal information via contrastive learning. Second, it can be applied to multiple languages, enabling a comparison of risk communication across cultures. We present a case study and discuss future implications of the proposed model.

2019

pdf bib
The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media
Jiyoung Han | Youngin Lee | Junbum Lee | Meeyoung Cha
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

This study analyzes the political slants of user comments on Korean partisan media. We built a BERT-based classifier to detect political leaning of short comments via the use of semi-unsupervised deep learning methods that produced an F1 score of 0.83. As a result of classifying 21.6K comments, we found the high presence of conservative bias on both conservative and liberal news outlets. Moreover, this study discloses an asymmetry across the partisan spectrum in that more liberals (48.0%) than conservatives (23.6%) comment not only on news stories resonating with their political perspectives but also on those challenging their viewpoints. These findings advance the current understanding of online echo chambers.