Young-Bum Kim

Also published as: Young-bum Kim


2021

pdf bib
AugNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation
Xinnuo Xu | Guoyin Wang | Young-Bum Kim | Sungjin Lee
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Natural Language Generation (NLG) is a key component in a task-oriented dialogue system, which converts the structured meaning representation (MR) to the natural language. For large-scale conversational systems, where it is common to have over hundreds of intents and thousands of slots, neither template-based approaches nor model-based approaches are scalable. Recently, neural NLGs started leveraging transfer learning and showed promising results in few-shot settings. This paper proposes AugNLG, a novel data augmentation approach that combines a self-trained neural retrieval model with a few-shot learned NLU model, to automatically create MR-to-Text data from open-domain texts. The proposed system mostly outperforms the state-of-the-art methods on the FewshotWOZ data in both BLEU and Slot Error Rate. We further confirm improved results on the FewshotSGD data and provide comprehensive analysis results on key components of our system. Our code and data are available at https://github.com/XinnuoXu/AugNLG.

pdf bib
Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents
Mohammad Kachuee | Hao Yuan | Young-Bum Kim | Sungjin Lee
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Turn-level user satisfaction is one of the most important performance metrics for conversational agents. It can be used to monitor the agent’s performance and provide insights about defective user experiences. While end-to-end deep learning has shown promising results, having access to a large number of reliable annotated samples required by these methods remains challenging. In a large-scale conversational system, there is a growing number of newly developed skills, making the traditional data collection, annotation, and modeling process impractical due to the required annotation costs and the turnaround times. In this paper, we suggest a self-supervised contrastive learning approach that leverages the pool of unlabeled data to learn user-agent interactions. We show that the pre-trained models using the self-supervised objective are transferable to the user satisfaction prediction. In addition, we propose a novel few-shot transfer learning approach that ensures better transferability for very small sample sizes. The suggested few-shot method does not require any inner loop optimization process and is scalable to very large datasets and complex models. Based on our experiments using real data from a large-scale commercial system, the suggested approach is able to significantly reduce the required number of annotations, while improving the generalization on unseen skills.

pdf bib
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers
Young-bum Kim | Yunyao Li | Owen Rambow
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

pdf bib
Learning Slice-Aware Representations with Mixture of Attentions
Cheng Wang | Sungjin Lee | Sunghyun Park | Han Li | Young-Bum Kim | Ruhi Sarikaya
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
A Scalable Framework for Learning From Implicit User Feedback to Improve Natural Language Understanding in Large-Scale Conversational AI Systems
Sunghyun Park | Han Li | Ameen Patel | Sidharth Mudgal | Sungjin Lee | Young-Bum Kim | Spyros Matsoukas | Ruhi Sarikaya
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Natural Language Understanding (NLU) is an established component within a conversational AI or digital assistant system, and it is responsible for producing semantic understanding of a user request. We propose a scalable and automatic approach for improving NLU in a large-scale conversational AI system by leveraging implicit user feedback, with an insight that user interaction data and dialog context have rich information embedded from which user satisfaction and intention can be inferred. In particular, we propose a domain-agnostic framework for curating new supervision data for improving NLU from live production traffic. With an extensive set of experiments, we show the results of applying the framework and improving NLU for a large-scale production system across 10 domains.

2019

pdf bib
Continuous Learning for Large-scale Personalized Domain Classification
Han Li | Jihwan Lee | Sidharth Mudgal | Ruhi Sarikaya | Young-Bum Kim
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Domain classification is the task to map spoken language utterances to one of the natural language understanding domains in intelligent personal digital assistants (IPDAs). This is observed in mainstream IPDAs in industry and third-party domains are developed to enhance the capability of the IPDAs. As more and more new domains are developed very frequently, how to continuously accommodate the new domains still remains challenging. Moreover, if one wants to use personalized information dynamically for better domain classification, it is infeasible to directly adopt existing continual learning approaches. In this paper, we propose CoNDA, a neural-based approach for continuous domain adaption with normalization and regularization. Unlike existing methods that often conduct full model parameter update, CoNDA only updates the necessary parameters in the model for the new domains. Empirical evaluation shows that CoNDA achieves high accuracy on both the accommodated new domains and the existing known domains for which input samples come with personal information, and outperforms the baselines by a large margin.

pdf bib
Locale-agnostic Universal Domain Classification Model in Spoken Language Understanding
Jihwan Lee | Ruhi Sarikaya | Young-Bum Kim
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

In this paper, we introduce an approach for leveraging available data across multiple locales sharing the same language to 1) improve domain classification model accuracy in Spoken Language Understanding and user experience even if new locales do not have sufficient data and 2) reduce the cost of scaling the domain classifier to a large number of locales. We propose a locale-agnostic universal domain classification model based on selective multi-task learning that learns a joint representation of an utterance over locales with different sets of domains and allows locales to share knowledge selectively depending on the domains. The experimental results demonstrate the effectiveness of our approach on domain classification task in the scenario of multiple locales with imbalanced data and disparate domain sets. The proposed approach outperforms other baselines models especially when classifying locale-specific domains and also low-resourced domains.

2018

pdf bib
A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding
Young-Bum Kim | Dongchan Kim | Joo-Kyung Kim | Ruhi Sarikaya
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

Intelligent personal digital assistants (IPDAs), a popular real-life application with spoken language understanding capabilities, can cover potentially thousands of overlapping domains for natural language understanding, and the task of finding the best domain to handle an utterance becomes a challenging problem on a large scale. In this paper, we propose a set of efficient and scalable shortlisting-reranking neural models for effective large-scale domain classification for IPDAs. The shortlisting stage focuses on efficiently trimming all domains down to a list of k-best candidate domains, and the reranking stage performs a list-wise reranking of the initial k-best domains with additional contextual information. We show the effectiveness of our approach with extensive experiments on 1,500 IPDA domains.

pdf bib
Efficient Large-Scale Neural Domain Classification with Personalized Attention
Young-Bum Kim | Dongchan Kim | Anjishnu Kumar | Ruhi Sarikaya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs). This scenario is observed in mainstream IPDAs in industry that allow third parties to develop thousands of new domains to augment built-in first party domains to rapidly increase domain coverage and overall IPDA capabilities. We propose a scalable neural model architecture with a shared encoder, a novel attention mechanism that incorporates personalization information and domain-specific classifiers that solves the problem efficiently. Our architecture is designed to efficiently accommodate incremental domain additions achieving two orders of magnitude speed up compared to full model retraining. We consider the practical constraints of real-time production systems, and design to minimize memory footprint and runtime latency. We demonstrate that incorporating personalization significantly improves domain classification accuracy in a setting with thousands of overlapping domains.

pdf bib
Character-Level Feature Extraction with Densely Connected Networks
Chanhee Lee | Young-Bum Kim | Dongyub Lee | Heuiseok Lim
Proceedings of the 27th International Conference on Computational Linguistics

Generating character-level features is an important step for achieving good results in various natural language processing tasks. To alleviate the need for human labor in generating hand-crafted features, methods that utilize neural architectures such as Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN) to automatically extract such features have been proposed and have shown great results. However, CNN generates position-independent features, and RNN is slow since it needs to process the characters sequentially. In this paper, we propose a novel method of using a densely connected network to automatically extract character-level features. The proposed method does not require any language or task specific assumptions, and shows robustness and effectiveness while being faster than CNN- or RNN-based methods. Evaluating this method on three sequence labeling tasks - slot tagging, Part-of-Speech (POS) tagging, and Named-Entity Recognition (NER) - we obtain state-of-the-art performance with a 96.62 F1-score and 97.73% accuracy on slot tagging and POS tagging, respectively, and comparable performance to the state-of-the-art 91.13 F1-score on NER.

pdf bib
Supervised Domain Enablement Attention for Personalized Domain Classification
Joo-Kyung Kim | Young-Bum Kim
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In large-scale domain classification for natural language understanding, leveraging each user’s domain enablement information, which refers to the preferred or authenticated domains by the user, with attention mechanism has been shown to improve the overall domain classification performance. In this paper, we propose a supervised enablement attention mechanism, which utilizes sigmoid activation for the attention weighting so that the attention can be computed with more expressive power without the weight sum constraint of softmax attention. The attention weights are explicitly encouraged to be similar to the corresponding elements of the output one-hot vector, and self-distillation is used to leverage the attention information of the other enabled domains. By evaluating on the actual utterances from a large-scale IPDA, we show that our approach significantly improves domain classification performance

2017

pdf bib
Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources
Joo-Kyung Kim | Young-Bum Kim | Ruhi Sarikaya | Eric Fosler-Lussier
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.

pdf bib
Domain Attention with an Ensemble of Experts
Young-Bum Kim | Karl Stratos | Dongchan Kim
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

An important problem in domain adaptation is to quickly generalize to a new domain with limited supervision given K existing domains. One approach is to retrain a global model across all K + 1 domains using standard techniques, for instance Daumé III (2009). However, it is desirable to adapt without having to re-estimate a global model from scratch each time a new domain with potentially new intents and slots is added. We describe a solution based on attending an ensemble of domain experts. We assume K domain specific intent and slot models trained on respective domains. When given domain K + 1, our model uses a weighted combination of the K domain experts’ feedback along with its own opinion to make predictions on the new domain. In experiments, the model significantly outperforms baselines that do not use domain adaptation and also performs better than the full retraining approach.

pdf bib
Adversarial Adaptation of Synthetic or Stale Data
Young-Bum Kim | Karl Stratos | Dongchan Kim
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Two types of data shift common in practice are 1. transferring from synthetic data to live user data (a deployment shift), and 2. transferring from stale data to current data (a temporal shift). Both cause a distribution mismatch between training and evaluation, leading to a model that overfits the flawed training data and performs poorly on the test data. We propose a solution to this mismatch problem by framing it as domain adaptation, treating the flawed training dataset as a source domain and the evaluation dataset as a target domain. To this end, we use and build on several recent advances in neural domain adaptation such as adversarial training (Ganinet al., 2016) and domain separation network (Bousmalis et al., 2016), proposing a new effective adversarial training scheme. In both supervised and unsupervised adaptation scenarios, our approach yields clear improvement over strong baselines.

2016

pdf bib
Natural Language Model Re-usability for Scaling to Different Domains
Young-Bum Kim | Alexandre Rochette | Ruhi Sarikaya
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Drop-out Conditional Random Fields for Twitter with Huge Mined Gazetteer
Eunsuk Yang | Young-Bum Kim | Ruhi Sarikaya | Yu-Seop Kim
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform
Paul Crook | Alex Marin | Vipul Agarwal | Khushboo Aggarwal | Tasos Anastasakos | Ravi Bikkula | Daniel Boies | Asli Celikyilmaz | Senthilkumar Chandramohan | Zhaleh Feizollahi | Roman Holenstein | Minwoo Jeong | Omar Khan | Young-Bum Kim | Elizabeth Krawczyk | Xiaohu Liu | Danko Panic | Vasiliy Radostev | Nikhil Ramesh | Jean-Phillipe Robichaud | Alexandre Rochette | Logan Stromberg | Ruhi Sarikaya
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Frustratingly Easy Neural Domain Adaptation
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Popular techniques for domain adaptation such as the feature augmentation method of Daumé III (2009) have mostly been considered for sparse binary-valued features, but not for dense real-valued features such as those used in neural networks. In this paper, we describe simple neural extensions of these techniques. First, we propose a natural generalization of the feature augmentation method that uses K + 1 LSTMs where one model captures global patterns across all K domains and the remaining K models capture domain-specific information. Second, we propose a novel application of the framework for learning shared structures by Ando and Zhang (2005) to domain adaptation, and also provide a neural extension of their approach. In experiments on slot tagging over 17 domains, our methods give clear performance improvement over Daumé III (2009) applied on feature-rich CRFs.

pdf bib
Domainless Adaptation by Constrained Decoding on a Schema Lattice
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In many applications such as personal digital assistants, there is a constant need for new domains to increase the system’s coverage of user queries. A conventional approach is to learn a separate model every time a new domain is introduced. This approach is slow, inefficient, and a bottleneck for scaling to a large number of domains. In this paper, we introduce a framework that allows us to have a single model that can handle all domains: including unknown domains that may be created in the future as long as they are covered in the master schema. The key idea is to remove the need for distinguishing domains by explicitly predicting the schema of queries. Given permitted schema of a query, we perform constrained decoding on a lattice of slot sequences allowed under the schema. The proposed model achieves competitive and often superior performance over the conventional model trained separately per domain.

pdf bib
Scalable Semi-Supervised Query Classification Using Matrix Sketching
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Weakly Supervised Slot Tagging with Partially Labeled Sequences from Web Search Click Logs
Young-Bum Kim | Minwoo Jeong | Karl Stratos | Ruhi Sarikaya
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Part-of-speech Taggers for Low-resource Languages using CCA Features
Young-Bum Kim | Benjamin Snyder | Ruhi Sarikaya
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition
Timothy Baldwin | Marie Catherine de Marneffe | Bo Han | Young-Bum Kim | Alan Ritter | Wei Xu
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
New Transfer Learning Techniques for Disparate Label Sets
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya | Minwoo Jeong
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Pre-training of Hidden-Unit CRFs
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Compact Lexicon Selection with Spectral Methods
Young-Bum Kim | Karl Stratos | Xiaohu Liu | Ruhi Sarikaya
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Training a Korean SRL System with Rich Morphological Features
Young-Bum Kim | Heemoon Chae | Benjamin Snyder | Yu-Seop Kim
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
Young-Bum Kim | Benjamin Snyder
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Optimal Data Set Selection: An Application to Grapheme-to-Phoneme Conversion
Young-Bum Kim | Benjamin Snyder
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets
Young-Bum Kim | Benjamin Snyder
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Young-Bum Kim | João Graça | Benjamin Snyder
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing