Jong C. Park

Also published as: Jong Park


2021

pdf bib
Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation
Soyeong Jeong | Jinheon Baek | ChaeHun Park | Jong Park
Proceedings of the Second Workshop on Scholarly Document Processing

One of the challenges in information retrieval (IR) is the vocabulary mismatch problem, which happens when the terms between queries and documents are lexically different but semantically similar. While recent work has proposed to expand the queries or documents by enriching their representations with additional relevant terms to address this challenge, they usually require a large volume of query-document pairs to train an expansion model. In this paper, we propose an Unsupervised Document Expansion with Generation (UDEG) framework with a pre-trained language model, which generates diverse supplementary sentences for the original document without using labels on query-document pairs for training. For generating sentences, we further stochastically perturb their embeddings to generate more diverse sentences for document expansion. We validate our framework on two standard IR benchmark datasets. The results show that our framework significantly outperforms relevant expansion baselines for IR.

pdf bib
Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model
ChaeHun Park | Eugene Jang | Wonsuk Yang | Jong Park
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Evaluating the quality of responses generated by open-domain conversation systems is a challenging task. This is partly because there can be multiple appropriate responses to a given dialogue history. Reference-based metrics that rely on comparisons to a set of known correct responses often fail to account for this variety, and consequently correlate poorly with human judgment. To address this problem, researchers have investigated the possibility of assessing response quality without using a set of known correct responses. RUBER demonstrated that an automatic response evaluation model could be made using unsupervised learning for the next-utterance prediction (NUP) task. For the unsupervised learning of such model, we propose a method of manipulating a golden response to create a new negative response that is designed to be inappropriate within the context while maintaining high similarity with the original golden response. We find, from our experiments on English datasets, that using the negative samples generated by our method alongside random negative samples can increase the model’s correlation with human evaluations. The process of generating such negative samples is automated and does not rely on human annotation.

pdf bib
A Large-scale Comprehensive Abusiveness Detection Dataset with Multifaceted Labels from Reddit
Hoyun Song | Soo Hyun Ryu | Huije Lee | Jong Park
Proceedings of the 25th Conference on Computational Natural Language Learning

As users in online communities suffer from severe side effects of abusive language, many researchers attempted to detect abusive texts from social media, presenting several datasets for such detection. However, none of them contain both comprehensive labels and contextual information, which are essential for thoroughly detecting all kinds of abusiveness from texts, since datasets with such fine-grained features demand a significant amount of annotations, leading to much increased complexity. In this paper, we propose a Comprehensive Abusiveness Detection Dataset (CADD), collected from the English Reddit posts, with multifaceted labels and contexts. Our dataset is annotated hierarchically for an efficient annotation through crowdsourcing on a large-scale. We also empirically explore the characteristics of our dataset and provide a detailed analysis for novel insights. The results of our experiments with strong pre-trained natural language understanding models on our dataset show that our dataset gives rise to meaningful performance, assuring its practicality for abusive language detection.

pdf bib
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
Heng Ji | Jong C. Park | Rui Xia
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

2019

pdf bib
Nonsense!: Quality Control via Two-Step Reason Selection for Annotating Local Acceptability and Related Attributes in News Editorials
Wonsuk Yang | Seungwon Yoon | Ada Carpenter | Jong Park
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Annotation quality control is a critical aspect for building reliable corpora through linguistic annotation. In this study, we present a simple but powerful quality control method using two-step reason selection. We gathered sentential annotations of local acceptance and three related attributes through a crowdsourcing platform. For each attribute, the reason for the choice of the attribute value is selected in a two-step manner. The options given for reason selection were designed to facilitate the detection of a nonsensical reason selection. We assume that a sentential annotation that contains a nonsensical reason is less reliable than the one without such reason. Our method, based solely on this assumption, is found to retain the annotations with satisfactory quality out of the entire annotations mixed with those of low quality.

pdf bib
Generating Sentential Arguments from Diverse Perspectives on Controversial Topic
ChaeHun Park | Wonsuk Yang | Jong Park
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Considering diverse aspects of an argumentative issue is an essential step for mitigating a biased opinion and making reasonable decisions. A related generation model can produce flexible results that cover a wide range of topics, compared to the retrieval-based method that may show unstable performance for unseen data. In this paper, we study the problem of generating sentential arguments from multiple perspectives, and propose a neural method to address this problem. Our model, ArgDiver (Argument generation model from diverse perspectives), in a way a conversational system, successfully generates high-quality sentential arguments. At the same time, the automatically generated arguments by our model show a higher diversity than those generated by any other baseline models. We believe that our work provides evidence for the potential of a good generation model in providing diverse perspectives on a controversial topic.

pdf bib
Computer Assisted Annotation of Tension Development in TED Talks through Crowdsourcing
Seungwon Yoon | Wonsuk Yang | Jong Park
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP

We propose a method of machine-assisted annotation for the identification of tension development, annotating whether the tension is increasing, decreasing, or staying unchanged. We use a neural network based prediction model, whose predicted results are given to the annotators as initial values for the options that they are asked to choose. By presenting such initial values to the annotators, the annotation task becomes an evaluation task where the annotators inspect whether or not the predicted results are correct. To demonstrate the effectiveness of our method, we performed the annotation task in both in-house and crowdsourced environments. For the crowdsourced environment, we compared the annotation results with and without our method of machine-assisted annotation. We find that the results with our method showed a higher agreement to the gold standard than those without, though our method had little effect at reducing the time for annotation. Our codes for the experiment are made publicly available.

2018

pdf bib
Feature Attention Network: Interpretable Depression Detection from Social Media
Hoyun Song | Jinseon You | Jin-Woo Chung | Jong C. Park
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2017

pdf bib
Extraction of Gene-Environment Interaction from the Biomedical Literature
Jinseon You | Jin-Woo Chung | Wonsuk Yang | Jong C. Park
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Genetic information in the literature has been extensively looked into for the purpose of discovering the etiology of a disease. As the gene-disease relation is sensitive to external factors, their identification is important to study a disease. Environmental influences, which are usually called Gene-Environment interaction (GxE), have been considered as important factors and have extensively been researched in biology. Nevertheless, there is still a lack of systems for automatic GxE extraction from the biomedical literature due to new challenges: (1) there are no preprocessing tools and corpora for GxE, (2) expressions of GxE are often quite implicit, and (3) document-level comprehension is usually required. We propose to overcome these challenges with neural network models and show that a modified sequence-to-sequence model with a static RNN decoder produces a good performance in GxE recognition.

2015

pdf bib
Corpus annotation with a linguistic analysis of the associations between event mentions and spatial expressions
Jin-Woo Chung | Jinseon You | Jong C. Park
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
CoMAGD: Annotation of Gene-Depression Relations
Rize Jin | Jinseon You | Jin-Woo Chung | Hee-Jin Lee | Maria Wolters | Jong Park
Proceedings of BioNLP 15

2013

pdf bib
Proceedings of the Sixth International Joint Conference on Natural Language Processing
Ruslan Mitkov | Jong C. Park
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Parsing Dependency Paths to Identify Event-Argument Relations
Seung-Cheol Baek | Jong Park
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Product Name Classification for Product Instance Distinction
Hye-Jin Min | Jong C. Park
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

2011

pdf bib
Detecting and Blocking False Sentiment Propagation
Hye-Jin Min | Jong C. Park
Proceedings of 5th International Joint Conference on Natural Language Processing

2009

pdf bib
Toward finer-grained sentiment identification in product reviews through linguistic and ontological analyses
Hye-Jin Min | Jong C. Park
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2007

pdf bib
Analysis of Indirect Uses of Interrogative Sentences Carrying Anger
Hye-Jin Min | Jong C. Park
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
From Text to Sign Language: Exploiting the Spatial and Motioning Dimension
Ji-Won Choi | Hee-Jin Lee | Jong C. Park
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation

pdf bib
Vowel Sound Disambiguation for Intelligible Korean Speech Synthesis
Ho-Joon Lee | Jong C. Park
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation

2004

pdf bib
BioAR: Anaphora Resolution for Relating Protein Names to Proteome Database Entries
Jung-Jae Kim | Jong C. Park
Proceedings of the Conference on Reference Resolution and Its Applications

2002

pdf bib
Natural Language Interpretations for Heterogeneous Database Access
Hodong Lee | Jong C. Park
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Automatic Augmentation of Translation Dictionary with Database Terminologies In Multilingual Query Interpretation
Hodong Lee | Jong C. Park
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

2000

pdf bib
Informed Parsing for Coordination with Combinatory Categorial Grammar
Jong C. Park | Hyung Joon Cho
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1999

pdf bib
Lexical selection with a target language monolingual corpus and an MRD
Hyun Ah Lee | Jong C. Park | Gil Chang Kim
Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1997

pdf bib
An English Grammar Checker as a Writing Aid for Students of English as a Second Language
Jong C. Park | Martha Palmer | Clay Washburn
Fifth Conference on Applied Natural Language Processing: Descriptions of System Demonstrations and Videos

1995

pdf bib
Quantifier Scope and Constituency
Jong C. Park
33rd Annual Meeting of the Association for Computational Linguistics

1992

pdf bib
A Unification-Based Semantic Interpretation for Coordinate Constructs
Jong C. Park
30th Annual Meeting of the Association for Computational Linguistics