Minju Kim


2024

pdf bib
Towards Context-Based Violence Detection: A Korean Crime Dialogue Dataset
Minju Kim | Heuiyeen Yeen | Myoung-Wan Koo
Findings of the Association for Computational Linguistics: EACL 2024

In order to enhance the security of society, there is rising interest in artificial intelligence (AI) to help detect and classify in advanced violence in daily life. The field of violence detection has introduced various datasets, yet context-based violence detection predominantly focuses on vision data, with a notable lack of NLP datasets. To overcome this, this paper presents the first Korean dialogue dataset for classifying violence that occurs in online settings: the Korean Crime Dialogue Dataset (KCDD). KCDD contains 22,249 dialogues created by crowd workers assuming offline scenarios. It has four criminal classes that meet international legal standards and one clean class (Serious Threats, Extortion or Blackmail, Harassment in the Workplace, Other Harassment, and Clean Dialogue). Plus, we propose a strong baseline for the proposed dataset, Relationship-Aware BERT. The model shows that understanding varying relationships among interlocutors improves the performance of crime dialogue classification. We hope that the proposed dataset will be used to detect cases of violence and aid people in danger. The KCDD dataset and corresponding baseline implementations can be found at the following link: https://sites.google.com/view/kcdd.

2023

pdf bib
Enhancing Task-Oriented Dialog System with Subjective Knowledge: A Large Language Model-based Data Augmentation Framework
Haein Jung | Heuiyeen Yeen | Jeehyun Lee | Minju Kim | Namo Bang | Myoung-Wan Koo
Proceedings of The Eleventh Dialog System Technology Challenge

As Task-Oriented Dialog (TOD) systems have advanced, structured DB systems, which aim to collect relevant knowledge for answering user’s questions, have also progressed. Despite these advancements, these methods face challenges when dealing with subjective questions from users. To overcome this, DSTC11 released a subjective-knowledge-based TOD (SK-TOD) dataset and benchmark. This paper introduces a framework that effectively solves SK-TOD tasks by leveraging a Large Language Model (LLM). We demonstrate the proficient use of LLM for each sub-task, including an adapters-based method and knowledge-grounded data augmentation. Our proposed methods, which utilize LLM as an efficient tool, outperform baseline performance and approaches that directly use LLM as a one-step sub-task solver, showing superior task-specific optimization.

2022

pdf bib
BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets
Minju Kim | Chaehyeong Kim | Yong Ho Song | Seung-won Hwang | Jinyoung Yeo
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

To build open-domain chatbots that are able to use diverse communicative skills, we propose a novel framework BotsTalk, where multiple agents grounded to the specific target skills participate in a conversation to automatically annotate multi-skill dialogues. We further present Blended Skill BotsTalk (BSBT), a large-scale multi-skill dialogue dataset comprising 300K conversations. Through extensive experiments, we demonstrate that our dataset can be effective for multi-skill dialogue systems which require an understanding of skill blending as well as skill grounding. Our code and data are available at https://github.com/convei-lab/BotsTalk.