Ohjoon Kwon
2024
SLM as Guardian: Pioneering AI Safety with Small Language Model
Ohjoon Kwon
|
Donghyeon Jeon
|
Nayoung Choi
|
Gyu-Hwung Cho
|
Hwiyeol Jo
|
Changbong Kim
|
Hyunwoo Lee
|
Inho Kang
|
Sun Kim
|
Taiwoo Park
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of their use cases. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. In this paper, we leverage a smaller LLM for both harmful query detection and safeguard response generation. We introduce our safety requirements and the taxonomy of harmfulness categories, and then propose a multi-task learning mechanism fusing the two tasks into a single model. We demonstrate the effectiveness of our approach, providing on par or surpassing harmful query detection and safeguard response performance compared to the publicly available LLMs.
2021
Handling Out-Of-Vocabulary Problem in Hangeul Word Embeddings
Ohjoon Kwon
|
Dohyun Kim
|
Soo-Ryeon Lee
|
Junyoung Choi
|
SangKeun Lee
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Word embedding is considered an essential factor in improving the performance of various Natural Language Processing (NLP) models. However, it is hardly applicable in real-world datasets as word embedding is generally studied with a well-refined corpus. Notably, in Hangeul (Korean writing system), which has a unique writing system, various kinds of Out-Of-Vocabulary (OOV) appear from typos. In this paper, we propose a robust Hangeul word embedding model against typos, while maintaining high performance. The proposed model utilizes a Convolutional Neural Network (CNN) architecture with a channel attention mechanism that learns to infer the original word embeddings. The model train with a dataset that consists of a mix of typos and correct words. To demonstrate the effectiveness of the proposed model, we conduct three kinds of intrinsic and extrinsic tasks. While the existing embedding models fail to maintain stable performance as the noise level increases, the proposed model shows stable performance.
Search
Co-authors
- Donghyeon Jeon 1
- Nayoung Choi 1
- Gyu-Hwung Cho 1
- Hwiyeol Jo 1
- Changbong Kim 1
- show all...