Younghoon Jeong
2022
KOLD: Korean Offensive Language Dataset
Younghoon Jeong
|
Juhyun Oh
|
Jongwon Lee
|
Jaimeen Ahn
|
Jihyung Moon
|
Sungjoon Park
|
Alice Oh
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Recent directions for offensive language detection are hierarchical modeling, identifying the type and the target of offensive language, and interpretability with offensive span annotation and prediction. These improvements are focused on English and do not transfer well to other languages because of cultural and linguistic differences. In this paper, we present the Korean Offensive Language Dataset (KOLD) comprising 40,429 comments, which are annotated hierarchically with the type and the target of offensive language, accompanied by annotations of the corresponding text spans. We collect the comments from NAVER news and YouTube platform and provide the titles of the articles and videos as the context information for the annotation process. We use these annotated comments as training data for Korean BERT and RoBERTa models and find that they are effective at offensiveness detection, target classification, and target span detection while having room for improvement for target group classification and offensive span detection. We discover that the target group distribution differs drastically from the existing English datasets, and observe that providing the context information improves the model performance in offensiveness detection (+0.3), target classification (+1.5), and target group classification (+13.1). We publicly release the dataset and baseline models.
Assessing How Users Display Self-Disclosure and Authenticity in Conversation with Human-Like Agents: A Case Study of Luda Lee
Won Ik Cho
|
Soomin Kim
|
Eujeong Choi
|
Younghoon Jeong
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
There is an ongoing discussion on what makes humans more engaged when interacting with conversational agents. However, in the area of language processing, there has been a paucity of studies on how people react to agents and share interactions with others. We attack this issue by investigating the user dialogues with human-like agents posted online and aim to analyze the dialogue patterns. We construct a taxonomy to discern the users’ self-disclosure in the dialogue and the communication authenticity displayed in the user posting. We annotate the in-the-wild data, examine the reliability of the proposed scheme, and discuss how the categorization can be utilized for future research and industrial development.
Evaluating How Users Game and Display Conversation with Human-Like Agents
Won Ik Cho
|
Soomin Kim
|
Eujeong Choi
|
Younghoon Jeong
Proceedings of the 3rd Workshop on Computational Approaches to Discourse
Recently, with the advent of high-performance generative language models, artificial agents that communicate directly with the users have become more human-like. This development allows users to perform a diverse range of trials with the agents, and the responses are sometimes displayed online by users who share or show-off their experiences. In this study, we explore dialogues with a social chatbot uploaded to an online community, with the aim of understanding how users game human-like agents and display their conversations. Having done this, we assert that user postings can be investigated from two aspects, namely conversation topic and purpose of testing, and suggest a categorization scheme for the analysis. We analyze 639 dialogues to develop an annotation protocol for the evaluation, and measure the agreement to demonstrate the validity. We find that the dialogue content does not necessarily reflect the purpose of testing, and also that users come up with creative strategies to game the agent without being penalized.
Search
Co-authors
- Won Ik Cho 2
- Soomin Kim 2
- Eujeong Choi 2
- Juhyun Oh 1
- Jongwon Lee 1
- show all...