Hyeonho Song


2024

pdf bib
Detecting Offensive Language in an Open Chatbot Platform
Hyeonho Song | Jisu Hong | Chani Jung | Hyojin Chin | Mingi Shin | Yubin Choi | Junghoi Choi | Meeyoung Cha
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

While detecting offensive language in online spaces remains an important societal issue, there is still a significant gap in existing research and practial datasets specific to chatbots. Furthermore, many of the current efforts by service providers to automatically filter offensive language are vulnerable to users’ deliberate text manipulation tactics, such as misspelling words. In this study, we analyze offensive language patterns in real logs of 6,254,261 chat utterance pairs from the commercial chat service Simsimi, which cover a variety of conversation topics. Based on the observed patterns, we introduce a novel offensive language detection method—a contrastive learning model that embeds chat content with a random masking strategy. We show that this model outperforms existing models in detecting offensive language in open-domain chat conversations while also demonstrating robustness against users’ deliberate text manipulation tactics when using offensive language. We release our curated chatbot dataset to foster research on offensive language detection in open-domain conversations and share lessons learned from mitigating offensive language on a live platform.

2023

pdf bib
Detecting Contextomized Quotes in News Headlines by Contrastive Learning
Seonyeong Song | Hyeonho Song | Kunwoo Park | Jiyoung Han | Meeyoung Cha
Findings of the Association for Computational Linguistics: EACL 2023

Quotes are critical for establishing credibility in news articles. A direct quote enclosed in quotation marks has a strong visual appeal and is a sign of a reliable citation. Unfortunately, this journalistic practice is not strictly followed, and a quote in the headline is often “contextomized.” Such a quote uses words out of context in a way that alters the speaker’s intention so that there is no semantically matching quote in the body text. We present QuoteCSE, a contrastive learning framework that represents the embedding of news quotes based on domain-driven positive and negative samples to identify such an editorial strategy. The dataset and code are available at https://github.com/ssu-humane/contextomized-quote-contrastive.