2024
pdf
bib
abs
Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding
YeonJoon Jung
|
Jaeseong Lee
|
Seungtaek Choi
|
Dohyeon Lee
|
Minsoo Kim
|
Seung-won Hwang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Recently, pre-trained language models (PLMs) have been increasingly adopted in spoken language understanding (SLU). However, automatic speech recognition (ASR) systems frequently produce inaccurate transcriptions, leading to noisy inputs for SLU models, which can significantly degrade their performance. To address this, our objective is to train SLU models to withstand ASR errors by exposing them to noises commonly observed in ASR systems, referred to as ASR-plausible noises. Speech noise injection (SNI) methods have pursued this objective by introducing ASR-plausible noises, but we argue that these methods are inherently biased towards specific ASR systems, or ASR-specific noises. In this work, we propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system, by cutting off the non-causal effect of noises. Experimental results and analyses demonstrate the effectiveness of our proposed methods in enhancing the robustness and generalizability of SLU models against unseen ASR systems by introducing more diverse and plausible ASR noises in advance.
pdf
bib
abs
COMMIT: Code-Mixing English-Centric Large Language Model for Multilingual Instruction Tuning
Jaeseong Lee
|
YeonJoon Jung
|
Seung-won Hwang
Findings of the Association for Computational Linguistics: NAACL 2024
Recently, instruction-tuned large language models (LLMs) are showing prominent performance on various tasks, such as question answering. However, the majority of instruction-tuned LLMs are English-centric, which hinders their application to low-resource language QA. In this paper, we propose COde-Mixed Multilingual Instruction Tuning (COMMIT) to adapt English-centric LLM to low-resource language QA. We point out two main causes of English-centricness: imbalance of unlabeled data, and English-centric instruction tuning datasets. To deviate from English-centric instruction tuning, we propose to specialize code-mixing for instruction tuning, which blocks code-mixing in English templates, to leverage the potential of its superiority. To overcome data imbalance, we perform cross-lingual alignment. The majority of cross-lingual alignment works focused on making representations similar, which is not desirable to decoder-based LLMs, such as LLaMA. Therefore, we propose code-mixed continual causal language modeling to align the decoder. COMMIT improves the exact match score of low-resourced language QA by up to 32x. Code is publicly available.
2023
pdf
bib
abs
Retrieval-augmented Video Encoding for Instructional Captioning
Yeonjoon Jung
|
Minsoo Kim
|
Seungtaek Choi
|
Jihyuk Kim
|
Minji Seo
|
Seung-won Hwang
Findings of the Association for Computational Linguistics: ACL 2023
Instructional videos make learning knowledge more efficient, by providing a detailed multimodal context of each procedure in instruction.A unique challenge posed by instructional videos is key-object degeneracy, where any single modality fails to sufficiently capture the key objects referred to in the procedure. For machine systems, such degeneracy can disturb the performance of a downstream task such as dense video captioning, leading to the generation of incorrect captions omitting key objects. To repair degeneracy, we propose a retrieval-based framework to augment the model representations in the presence of such key-object degeneracy. We validate the effectiveness and generalizability of our proposed framework over baselines using modalities with key-object degeneracy.
pdf
bib
On Consistency Training for Language-Based Image Editing Interface
Youngwon Lee
|
Ayoung Lee
|
Yeonjoon Jung
|
Seung-won Hwang
Proceedings of the Second Workshop on Natural Language Interfaces
2022
pdf
bib
abs
PLM-based World Models for Text-based Games
Minsoo Kim
|
Yeonjoon Jung
|
Dohyeon Lee
|
Seung-won Hwang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
World models have improved the ability of reinforcement learning agents to operate in a sample efficient manner, by being trained to predict plausible changes in the underlying environment. As the core tasks of world models are future prediction and commonsense understanding, our claim is that pre-trained language models (PLMs) already provide a strong base upon which to build world models. Worldformer is a recently proposed world model for text-based game environments, based only partially on PLM and transformers. Our distinction is to fully leverage PLMs as actionable world models in text-based game environments, by reformulating generation as constrained decoding which decomposes actions into verb templates and objects. We show that our model improves future valid action prediction and graph change prediction. Additionally, we show that our model better reflects commonsense than standard PLM.
pdf
bib
abs
Debiasing Event Understanding for Visual Commonsense Tasks
Minji Seo
|
YeonJoon Jung
|
Seungtaek Choi
|
Seung-won Hwang
|
Bei Liu
Findings of the Association for Computational Linguistics: ACL 2022
We study event understanding as a critical step towards visual commonsense tasks. Meanwhile, we argue that current object-based event understanding is purely likelihood-based, leading to incorrect event prediction, due to biased correlation between events and objects. We propose to mitigate such biases with do-calculus, proposed in causality research, but overcoming its limited robustness, by an optimized aggregation with association-based prediction.We show the effectiveness of our approach, intrinsically by comparing our generated events with ground-truth event annotation, and extrinsically by downstream commonsense tasks.