Sungwon Han


2024

pdf bib
Platform-Invariant Topic Modeling via Contrastive Learning to Mitigate Platform-Induced Bias
Minseo Koo | Doeun Kim | Sungwon Han | Sungkyu Shaun Park
Findings of the Association for Computational Linguistics: EMNLP 2024

Cross-platform topic dissemination is one of the research subjects that delved into media analysis; sometimes it fails to grasp the authentic topics due to platform-induced biases, which may be caused by aggregating documents from multiple platforms and running them on an existing topic model. This work deals with the impact of unique platform characteristics on the performance of topic models and proposes a new approach to enhance the effectiveness of topic modeling. The data utilized in this study consisted of a total of 1.5 million posts collected using the keyword ”ChatGPT” on the three social media platforms. The devised model reduces platform influence in topic models by developing a platform-invariant contrastive learning algorithm and removing platform-specific jargon word sets. The proposed approach was thoroughly validated through quantitative and qualitative experiments alongside standard and state-of-the-art topic models and showed its supremacy. This method can mitigate biases arising from platform influences when modeling topics from texts collected across various platforms.

2023

pdf bib
Unified Neural Topic Model via Contrastive Learning and Term Weighting
Sungwon Han | Mingi Shin | Sungkyu Park | Changwook Jung | Meeyoung Cha
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Two types of topic modeling predominate: generative methods that employ probabilistic latent models and clustering methods that identify semantically coherent groups. This paper newly presents UTopic (Unified neural Topic model via contrastive learning and term weighting) that combines the advantages of these two types. UTopic uses contrastive learning and term weighting to learn knowledge from a pretrained language model and discover influential terms from semantically coherent clusters. Experiments show that the generated topics have a high-quality topic-word distribution in terms of topic coherence, outperforming existing baselines across multiple topic coherence measures. We demonstrate how our model can be used as an add-on to existing topic models and improve their performance.

2020

pdf bib
A Risk Communication Event Detection Model via Contrastive Learning
Mingi Shin | Sungwon Han | Sungkyu Park | Meeyoung Cha
Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper presents a time-topic cohesive model describing the communication patterns on the coronavirus pandemic from three Asian countries. The strength of our model is two-fold. First, it detects contextualized events based on topical and temporal information via contrastive learning. Second, it can be applied to multiple languages, enabling a comparison of risk communication across cultures. We present a case study and discuss future implications of the proposed model.