Seung-won Hwang


pdf bib
ReACC: A Retrieval-Augmented Code Completion Framework
Shuai Lu | Nan Duan | Hojae Han | Daya Guo | Seung-won Hwang | Alexey Svyatkovskiy
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly improve the performance in the code completion task via learning from large-scale source code datasets. However, current approaches focus only on code context within the file or project, i.e. internal context. Our distinction is utilizing ”external” context, inspired by human behaviors of copying from the related code snippets when writing code. Specifically, we propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We adopt a stage-wise training approach that combines a source code retriever and an auto-regressive language model for programming language. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.

pdf bib
Plug-and-Play Adaptation for Continuously-updated QA
Kyungjae Lee | Wookje Han | Seung-won Hwang | Hwaran Lee | Joonsuk Park | Sang-Woo Lee
Findings of the Association for Computational Linguistics: ACL 2022

Language models (LMs) have shown great potential as implicit knowledge bases (KBs). And for their practical use, knowledge in LMs need to be updated periodically. However, existing tasks to assess LMs’ efficacy as KBs do not adequately consider multiple large-scale updates. To this end, we first propose a novel task—Continuously-updated QA (CuQA)—in which multiple large-scale updates are made to LMs, and the performance is measured with respect to the success in adding and updating knowledge while retaining existing knowledge. We then present LMs with plug-in modules that effectively handle the updates. Experiments conducted on zsRE QA and NQ datasets show that our method outperforms existing approaches. We find that our method is 4x more effective in terms of updates/forgets ratio, compared to a fine-tuning baseline.

pdf bib
Debiasing Event Understanding for Visual Commonsense Tasks
Minji Seo | YeonJoon Jung | Seungtaek Choi | Seung-won Hwang | Bei Liu
Findings of the Association for Computational Linguistics: ACL 2022

We study event understanding as a critical step towards visual commonsense tasks.Meanwhile, we argue that current object-based event understanding is purely likelihood-based, leading to incorrect event prediction, due to biased correlation between events and objects.We propose to mitigate such biases with do-calculus, proposed in causality research, but overcoming its limited robustness, by an optimized aggregation with association-based prediction.We show the effectiveness of our approach, intrinsically by comparing our generated events with ground-truth event annotation, and extrinsically by downstream commonsense tasks.


pdf bib
Structure-Augmented Keyphrase Generation
Jihyuk Kim | Myeongho Jeong | Seungtaek Choi | Seung-won Hwang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper studies the keyphrase generation (KG) task for scenarios where structure plays an important role. For example, a scientific publication consists of a short title and a long body, where the title can be used for de-emphasizing unimportant details in the body. Similarly, for short social media posts (, tweets), scarce context can be augmented from titles, though often missing. Our contribution is generating/augmenting structure then injecting these information in the encoding, using existing keyphrases of other documents, complementing missing/incomplete titles. We propose novel structure-augmented document encoding approaches that consist of the following two phases: The first phase, generating structure, extends the given document with related but absent keyphrases, augmenting missing context. The second phase, encoding structure, builds a graph of keyphrases and the given document to obtain the structure-aware representation of the augmented text. Our empirical results validate that our proposed structure augmentation and augmentation-aware encoding/decoding can improve KG for both scenarios, outperforming the state-of-the-art.

pdf bib
Query Generation for Multimodal Documents
Kyungho Kim | Kyungjae Lee | Seung-won Hwang | Young-In Song | Seungwook Lee
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper studies the problem of generatinglikely queries for multimodal documents withimages. Our application scenario is enablingefficient “first-stage retrieval” of relevant doc-uments, by attaching generated queries to doc-uments before indexing. We can then indexthis expanded text to efficiently narrow downto candidate matches using inverted index, sothat expensive reranking can follow. Our eval-uation results show that our proposed multi-modal representation meaningfully improvesrelevance ranking.More importantly, ourframework can achieve the state of the art inthe first stage retrieval scenarios

pdf bib
Robustifying Multi-hop QA through Pseudo-Evidentiality Training
Kyungjae Lee | Seung-won Hwang | Sang-eun Han | Dohyeon Lee
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper studies the bias problem of multi-hop question answering models, of answering correctly without correct reasoning. One way to robustify these models is by supervising to not only answer right, but also with right reasoning chains. An existing direction is to annotate reasoning chains to train models, requiring expensive additional annotations. In contrast, we propose a new approach to learn evidentiality, deciding whether the answer prediction is supported by correct evidences, without such annotations. Instead, we compare counterfactual changes in answer confidence with and without evidence sentences, to generate “pseudo-evidentiality” annotations. We validate our proposed model on an original set and challenge set in HotpotQA, showing that our method is accurate and robust in multi-hop reasoning.


pdf bib
Retrieval-Augmented Controllable Review Generation
Jihyeok Kim | Seungtaek Choi | Reinald Kim Amplayo | Seung-won Hwang
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we study review generation given a set of attribute identifiers which are user ID, product ID and rating. This is a difficult subtask of natural language generation since models are limited to the given identifiers, without any specific descriptive information regarding the inputs, when generating the text. The capacity of these models is thus confined and dependent to how well the models can capture vector representations of attributes. We thus propose to additionally leverage references, which are selected from a large pool of texts labeled with one of the attributes, as textual information that enriches inductive biases of given attributes. With these references, we can now pose the problem as an instance of text-to-text generation, which makes the task easier since texts that are syntactically, semantically similar with the output text are provided as input. Using this framework, we address issues such as selecting references from a large candidate set without textual context and improving the model complexity for generation. Our experiments show that our models improve over previous approaches on both automatic and human evaluation metrics.

pdf bib
Less is More: Attention Supervision with Counterfactuals for Text Classification
Seungtaek Choi | Haeju Park | Jinyoung Yeo | Seung-won Hwang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We aim to leverage human and machine intelligence together for attention supervision. Specifically, we show that human annotation cost can be kept reasonably low, while its quality can be enhanced by machine self-supervision. Specifically, for this goal, we explore the advantage of counterfactual reasoning, over associative reasoning typically used in attention supervision. Our empirical results show that this machine-augmented human attention supervision is more effective than existing methods requiring a higher annotation cost, in text classification tasks, including sentiment analysis and news categorization.

pdf bib
Label-Efficient Training for Next Response Selection
Seungtaek Choi | Myeongho Jeong | Jinyoung Yeo | Seung-won Hwang
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

This paper studies label augmentation for training dialogue response selection. The existing model is trained by “observational” annotation, where one observed response is annotated as gold. In this paper, we propose “counterfactual augmentation” of pseudo-positive labels. We validate that the effectiveness of augmented labels are comparable to positives, such that ours outperform state-of-the-arts without augmentation.

pdf bib
SQuAD2-CR: Semi-supervised Annotation for Cause and Rationales for Unanswerability in SQuAD 2.0
Gyeongbok Lee | Seung-won Hwang | Hyunsouk Cho
Proceedings of the 12th Language Resources and Evaluation Conference

Existing machine reading comprehension models are reported to be brittle for adversarially perturbed questions when optimizing only for accuracy, which led to the creation of new reading comprehension benchmarks, such as SQuAD 2.0 which contains such type of questions. However, despite the super-human accuracy of existing models on such datasets, it is still unclear how the model predicts the answerability of the question, potentially due to the absence of a shared annotation for the explanation. To address such absence, we release SQuAD2-CR dataset, which contains annotations on unanswerable questions from the SQuAD 2.0 dataset, to enable an explanatory analysis of the model prediction. Specifically, we annotate (1) explanation on why the most plausible answer span cannot be the answer and (2) which part of the question causes unanswerability. We share intuitions and experimental results that how this dataset can be used to analyze and improve the interpretability of existing reading comprehension model behavior.


pdf bib
Categorical Metadata Representation for Customized Text Classification
Jihyeok Kim | Reinald Kim Amplayo | Kyungjae Lee | Sua Sung | Minji Seo | Seung-won Hwang
Transactions of the Association for Computational Linguistics, Volume 7

The performance of text classification has improved tremendously using intelligently engineered neural-based models, especially those injecting categorical metadata as additional information, e.g., using user/product information for sentiment classification. This information has been used to modify parts of the model (e.g., word embeddings, attention mechanisms) such that results can be customized according to the metadata. We observe that current representation methods for categorical metadata, which are devised for human consumption, are not as effective as claimed in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category, and even such context is often scarce (for tail category). To this end, we propose using basis vectors to effectively incorporate categorical metadata on various parts of a neural-based model. This additionally decreases the number of parameters dramatically, especially when the number of categorical features is large. Extensive experiments on various data sets with different properties are performed and show that through our method, we can represent categorical metadata more effectively to customize parts of the model, including unexplored ones, and increase the performance of the model greatly.

pdf bib
NL2pSQL: Generating Pseudo-SQL Queries from Under-Specified Natural Language Questions
Fuxiang Chen | Seung-won Hwang | Jaegul Choo | Jung-Woo Ha | Sunghun Kim
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating SQL codes from natural language questions (NL2SQL) is an emerging research area. Existing studies have mainly focused on clear scenarios where specified information is fully given to generate a SQL query. However, in developer forums such as Stack Overflow, questions cover more diverse tasks including table manipulation or performance issues, where a table is not specified. The SQL query posted in Stack Overflow, Pseudo-SQL (pSQL), does not usually contain table schemas and is not necessarily executable, is sufficient to guide developers. Here we describe a new NL2pSQL task to generate pSQL codes from natural language questions on under-specified database issues, NL2pSQL. In addition, we define two new metrics suitable for the proposed NL2pSQL task, Canonical-BLEU and SQL-BLEU, instead of the conventional BLEU. With a baseline model using sequence-to-sequence architecture integrated by denoising autoencoder, we confirm the validity of our task. Experiments show that the proposed NL2pSQL approach yields well-formed queries (up to 43% more than a standard Seq2Seq model). Our code and datasets will be publicly released.

pdf bib
Learning with Limited Data for Multilingual Reading Comprehension
Kyungjae Lee | Sunghyun Park | Hojae Han | Jinyoung Yeo | Seung-won Hwang | Juho Lee
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper studies the problem of supporting question answering in a new language with limited training resources. As an extreme scenario, when no such resource exists, one can (1) transfer labels from another language, and (2) generate labels from unlabeled data, using translator and automatic labeling function respectively. However, these approaches inevitably introduce noises to the training data, due to translation or generation errors, which require a judicious use of data with varying confidence. To address this challenge, we propose a weakly-supervised framework that quantifies such noises from automatically generated labels, to deemphasize or fix noisy data in training. On reading comprehension task, we demonstrate the effectiveness of our model on low-resource languages with varying similarity to English, namely, Korean and French.

pdf bib
MICRON: Multigranular Interaction for Contextualizing RepresentatiON in Non-factoid Question Answering
Hojae Han | Seungtaek Choi | Haeju Park | Seung-won Hwang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper studies the problem of non-factoid question answering, where the answer may span over multiple sentences. Existing solutions can be categorized into representation- and interaction-focused approaches. We combine their complementary strength, by a hybrid approach allowing multi-granular interactions, but represented at word level, enabling an easy integration with strong word-level signals. Specifically, we propose MICRON: Multigranular Interaction for Contextualizing RepresentatiON, a novel approach which derives contextualized uni-gram representation from n-grams. Our contributions are as follows: First, we enable multi-granular matches between question and answer n-grams. Second, by contextualizing word representation with surrounding n-grams, MICRON can naturally utilize word-based signals for query term weighting, known to be effective in information retrieval. We validate MICRON in two public non-factoid question answering datasets: WikiPassageQA and InsuranceQA, showing our model achieves the state of the art among baselines with reported performances on both datasets.

pdf bib
Evaluating Research Novelty Detection: Counterfactual Approaches
Reinald Kim Amplayo | Seung-won Hwang | Min Song
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

In this paper, we explore strategies to evaluate models for the task research paper novelty detection: Given all papers released at a given date, which of the papers discuss new ideas and influence future research? We find the novelty is not a singular concept, and thus inherently lacks of ground truth annotations with cross-annotator agreement, which is a major obstacle in evaluating these models. Test-of-time award is closest to such annotation, which can only be made retrospectively and is extremely scarce. We thus propose to compare and evaluate models using counterfactual simulations. First, we ask models if they can differentiate papers at time t and counterfactual paper from future time t+d. Second, we ask models if they can predict test-of-time award at t+d. These are proxies that can be agreed by human annotators and easily augmented by correlated signals, using which evaluation can be done through four tasks: classification, ranking, correlation and feature selection. We show these proxy evaluation methods complement each other regarding error handling, coverage, interpretability, and scope, and thus altogether contribute to the observation of the relative strength of existing models.

pdf bib
Soft Representation Learning for Sparse Transfer
Haeju Park | Jinyoung Yeo | Gengyu Wang | Seung-won Hwang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Transfer learning is effective for improving the performance of tasks that are related, and Multi-task learning (MTL) and Cross-lingual learning (CLL) are important instances. This paper argues that hard-parameter sharing, of hard-coding layers shared across different tasks or languages, cannot generalize well, when sharing with a loosely related task. Such case, which we call sparse transfer, might actually hurt performance, a phenomenon known as negative transfer. Our contribution is using adversarial training across tasks, to “soft-code” shared and private spaces, to avoid the shared space gets too sparse. In CLL, our proposed architecture considers another challenge of dealing with low-quality input.


pdf bib
Mining Cross-Cultural Differences and Similarities in Social Media
Bill Yuchen Lin | Frank F. Xu | Kenny Zhu | Seung-won Hwang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-cultural differences and similarities are common in cross-lingual natural language understanding, especially for research in social media. For instance, people of distinct cultures often hold different opinions on a single named entity. Also, understanding slang terms across languages requires knowledge of cross-cultural similarities. In this paper, we study the problem of computing such cross-cultural differences and similarities. We present a lightweight yet effective approach, and evaluate it on two novel tasks: 1) mining cross-cultural differences of named entities and 2) finding similar terms for slang across languages. Experimental results show that our framework substantially outperforms a number of baseline methods on both tasks. The framework could be useful for machine translation applications and research in computational social science.

pdf bib
Cold-Start Aware User and Product Attention for Sentiment Classification
Reinald Kim Amplayo | Jihyeok Kim | Sua Sung | Seung-won Hwang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The use of user/product information in sentiment analysis is important, especially for cold-start users/products, whose number of reviews are very limited. However, current models do not deal with the cold-start problem which is typical in review websites. In this paper, we present Hybrid Contextualized Sentiment Classifier (HCSC), which contains two modules: (1) a fast word encoder that returns word vectors embedded with short and long range dependency features; and (2) Cold-Start Aware Attention (CSAA), an attention mechanism that considers the existence of cold-start problem when attentively pooling the encoded word vectors. HCSC introduces shared vectors that are constructed from similar users/products, and are used when the original distinct vectors do not have sufficient information (i.e. cold-start). This is decided by a frequency-guided selective gate vector. Our experiments show that in terms of RMSE, HCSC performs significantly better when compared with on famous datasets, despite having less complexity, and thus can be trained much faster. More importantly, our model performs significantly better than previous models when the training data is sparse and has cold-start problems.

pdf bib
Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning
Jinyoung Yeo | Gyeongbok Lee | Gengyu Wang | Seungtaek Choi | Hyunsouk Cho | Reinald Kim Amplayo | Seung-won Hwang
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Semi-supervised Training Data Generation for Multilingual Question Answering
Kyungjae Lee | Kyoungho Yoon | Sunghyun Park | Seung-won Hwang
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Entity Commonsense Representation for Neural Abstractive Summarization
Reinald Kim Amplayo | Seonjae Lim | Seung-won Hwang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

A major proportion of a text summary includes important entities found in the original text. These entities build up the topic of the summary. Moreover, they hold commonsense information once they are linked to a knowledge base. Based on these observations, this paper investigates the usage of linked entities to guide the decoder of a neural text summarizer to generate concise and better summaries. To this end, we leverage on an off-the-shelf entity linking system (ELS) to extract linked entities and propose Entity2Topic (E2T), a module easily attachable to a sequence-to-sequence model that transforms a list of entities into a vector representation of the topic of the summary. Current available ELS’s are still not sufficiently effective, possibly introducing unresolved ambiguities and irrelevant entities. We resolve the imperfections of the ELS by (a) encoding entities with selective disambiguation, and (b) pooling entity vectors using firm attention. By applying E2T to a simple sequenceto-sequence model with attention mechanism as base model, we see significant improvements of the performance in the Gigaword (sentence to title) and CNN (long document to multi-sentence highlights) summarization datasets by at least 2 ROUGE points.


pdf bib
Probabilistic Prototype Model for Serendipitous Property Mining
Taesung Lee | Seung-won Hwang | Zhongyuan Wang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Besides providing the relevant information, amusing users has been an important role of the web. Many web sites provide serendipitous (unexpected but relevant) information to draw user traffic. In this paper, we study the representative scenario of mining an amusing quiz. An existing approach leverages a knowledge base to mine an unexpected property then find quiz questions on such property, based on prototype theory in cognitive science. However, existing deterministic model is vulnerable to noise in the knowledge base. Therefore, we instead propose to leverage probabilistic approach to build a prototype that can overcome noise. Our extensive empirical study shows that our approach not only significantly outperforms baselines by 0.06 in accuracy, and 0.11 in serendipity but also shows higher relevance than the traditional relevance-pursuing baseline using TF-IDF.


pdf bib
Map Translation Using Geo-tagged Social Media
Sunyou Lee | Taesung Lee | Seung-won Hwang
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Understanding Relation Temporality of Entities
Taesung Lee | Seung-won Hwang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
Bootstrapping Entity Translation on Weakly Comparable Corpora
Taesung Lee | Seung-won Hwang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Enriching Entity Translation Discovery using Selective Temporality
Gae-won You | Young-rok Cha | Jinhan Kim | Seung-won Hwang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
Mining Name Translations from Entity Graph Mapping
Gae-won You | Seung-won Hwang | Young-In Song | Long Jiang | Zaiqing Nie
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing