Joon-Young Choi


2023

pdf bib
SMoP: Towards Efficient and Effective Prompt Tuning with Sparse Mixture-of-Prompts
Joon-Young Choi | Junho Kim | Jun-Hyung Park | Wing-Lam Mok | SangKeun Lee
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Prompt tuning has emerged as a successful parameter-efficient alternative to the full fine-tuning of language models. However, prior works on prompt tuning often utilize long soft prompts of up to 100 tokens to improve performance, overlooking the inefficiency associated with extended inputs. In this paper, we propose a novel prompt tuning method SMoP (Sparse Mixture-of-Prompts) that utilizes short soft prompts for efficient training and inference while maintaining performance gains typically induced from longer soft prompts. To achieve this, SMoP employs a gating mechanism to train multiple short soft prompts specialized in handling different subsets of the data, providing an alternative to relying on a single long soft prompt to cover the entire data. Experimental results demonstrate that SMoP outperforms baseline methods while reducing training and inference costs. We release our code at https://github.com/jyjohnchoi/SMoP.

2022

pdf bib
Break it Down into BTS: Basic, Tiniest Subword Units for Korean
Nayeon Kim | Jun-Hyung Park | Joon-Young Choi | Eojin Jeon | Youjin Kang | SangKeun Lee
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We introduce Basic, Tiniest Subword (BTS) units for the Korean language, which are inspired by the invention principle of Hangeul, the Korean writing system. Instead of relying on 51 Korean consonant and vowel letters, we form the letters from BTS units by adding strokes or combining them. To examine the impact of BTS units on Korean language processing, we develop a novel BTS-based word embedding framework that is readily applicable to various models. Our experiments reveal that BTS units significantly improve the performance of Korean word embedding on all intrinsic and extrinsic tasks in our evaluation. In particular, BTS-based word embedding outperforms the state-of-theart Korean word embedding by 11.8% in word analogy. We further investigate the unique advantages provided by BTS units through indepth analysis.

pdf bib
Tutoring Helps Students Learn Better: Improving Knowledge Distillation for BERT with Tutor Network
Junho Kim | Jun-Hyung Park | Mingyu Lee | Wing-Lam Mok | Joon-Young Choi | SangKeun Lee
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Pre-trained language models have achieved remarkable successes in natural language processing tasks, coming at the cost of increasing model size. To address this issue, knowledge distillation (KD) has been widely applied to compress language models. However, typical KD approaches for language models have overlooked the difficulty of training examples, suffering from incorrect teacher prediction transfer and sub-efficient training. In this paper, we propose a novel KD framework, Tutor-KD, which improves the distillation effectiveness by controlling the difficulty of training examples during pre-training. We introduce a tutor network that generates samples that are easy for the teacher but difficult for the student, with training on a carefully designed policy gradient method. Experimental results show that Tutor-KD significantly and consistently outperforms the state-of-the-art KD methods with variously sized student models on the GLUE benchmark, demonstrating that the tutor can effectively generate training examples for the student.

pdf bib
Learning from Missing Relations: Contrastive Learning with Commonsense Knowledge Graphs for Commonsense Inference
Yong-Ho Jung | Jun-Hyung Park | Joon-Young Choi | Mingyu Lee | Junho Kim | Kang-Min Kim | SangKeun Lee
Findings of the Association for Computational Linguistics: ACL 2022

Commonsense inference poses a unique challenge to reason and generate the physical, social, and causal conditions of a given event. Existing approaches to commonsense inference utilize commonsense transformers, which are large-scale language models that learn commonsense knowledge graphs. However, they suffer from a lack of coverage and expressive diversity of the graphs, resulting in a degradation of the representation quality. In this paper, we focus on addressing missing relations in commonsense knowledge graphs, and propose a novel contrastive learning framework called SOLAR. Our framework contrasts sets of semantically similar and dissimilar events, learning richer inferential knowledge compared to existing approaches. Empirical results demonstrate the efficacy of SOLAR in commonsense inference of diverse commonsense knowledge graphs. Specifically, SOLAR outperforms the state-of-the-art commonsense transformer on commonsense inference with ConceptNet by 1.84% on average among 8 automatic evaluation metrics. In-depth analysis of SOLAR sheds light on the effects of the missing relations utilized in learning commonsense knowledge graphs.