Kunwoo Park


2023

pdf bib
Detecting Contextomized Quotes in News Headlines by Contrastive Learning
Seonyeong Song | Hyeonho Song | Kunwoo Park | Jiyoung Han | Meeyoung Cha
Findings of the Association for Computational Linguistics: EACL 2023

Quotes are critical for establishing credibility in news articles. A direct quote enclosed in quotation marks has a strong visual appeal and is a sign of a reliable citation. Unfortunately, this journalistic practice is not strictly followed, and a quote in the headline is often “contextomized.” Such a quote uses words out of context in a way that alters the speaker’s intention so that there is no semantically matching quote in the body text. We present QuoteCSE, a contrastive learning framework that represents the embedding of news quotes based on domain-driven positive and negative samples to identify such an editorial strategy. The dataset and code are available at https://github.com/ssu-humane/contextomized-quote-contrastive.

pdf bib
K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings
Chaewon Park | Soohwan Kim | Kyubyong Park | Kunwoo Park
Findings of the Association for Computational Linguistics: EMNLP 2023

Numerous datasets have been proposed to combat the spread of online hate. Despite these efforts, a majority of these resources are English-centric, primarily focusing on overt forms of hate. This research gap calls for developing high-quality corpora in diverse languages that also encapsulate more subtle hate expressions. This study introduces K-HATERS, a new corpus for hate speech detection in Korean, comprising approximately 192K news comments with target-specific offensiveness ratings. This resource is the largest offensive language corpus in Korean and is the first to offer target-specific ratings on a three-point Likert scale, enabling the detection of hate expressions in Korean across varying degrees of offensiveness. We conduct experiments showing the effectiveness of the proposed corpus, including a comparison with existing datasets. Additionally, to address potential noise and bias in human annotations, we explore a novel idea of adopting the Cognitive Reflection Test, which is widely used in social science for assessing an individual’s cognitive ability, as a proxy of labeling quality. Findings indicate that annotations from individuals with the lowest test scores tend to yield detection models that make biased predictions toward specific target groups and are less accurate. This study contributes to the NLP research on hate speech detection and resource construction. The code and dataset can be accessed at https://github.com/ssu-humane/K-HATERS.

2022

pdf bib
How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image
Hyewon Choi | Yejun Yoon | Seunghyun Yoon | Kunwoo Park
Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations

This study investigates how fake news use the thumbnail image for a news article. We aim at capturing the degree of semantic incongruity between news text and image by using the pretrained CLIP representation. Motivated by the stylistic distinctiveness in fake news text, we examine whether fake news tends to use an irrelevant image to the news content. Results show that fake news tends to have a high degree of semantic incongruity than general news. We further attempt to detect such image-text incongruity by training classification models on a newly generated dataset. A manual evaluation suggests our method can find news articles of which the thumbnail image is semantically irrelevant to news text with an accuracy of 0.8. We also release a new dataset of image and news text pairs with the incongruity label, facilitating future studies on the direction.

2021

pdf bib
Who Blames or Endorses Whom? Entity-to-Entity Directed Sentiment Extraction in News Text
Kunwoo Park | Zhufeng Pan | Jungseock Joo
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021