Di Jin


2022

pdf bib
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets
Hyounghun Kim | Aishwarya Padmakumar | Di Jin | Mohit Bansal | Dilek Hakkani-Tur
Proceedings of the Third Workshop on Insights from Negative Results in NLP

Natural language guided embodied task completion is a challenging problem since it requires understanding natural language instructions, aligning them with egocentric visual observations, and choosing appropriate actions to execute in the environment to produce desired changes. We experiment with augmenting a transformer model for this task with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action. We observed that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED. However, our best model selected using the unseen validation set underperforms on the unseen test split of ALFRED, indicating that performance on the unseen validation set may not in itself be a sufficient indicator of whether model improvements generalize to unseen test sets. We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits, and highlights the need to modify benchmark design to better account for variance in model performance.

pdf bib
Enhancing Knowledge Selection for Grounded Dialogues via Document Semantic Graphs
Sha Li | Mahdi Namazifar | Di Jin | Mohit Bansal | Heng Ji | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging. Existing models treat knowledge selection as a sentence ranking or classification problem where each sentence is handled individually, ignoring the internal semantic connection between sentences. In this work, we propose to automatically convert the background knowledge documents into document semantic graphs and then perform knowledge selection over such graphs. Our document semantic graphs preserve sentence-level information through the use of sentence nodes and provide concept connections between sentences. We apply multi-task learning to perform sentence-level knowledge selection and concept-level knowledge selection, showing that it improves sentence-level selection. Our experiments show that our semantic graph-based knowledge selection improves over sentence selection baselines for both the knowledge selection task and the end-to-end response generation task on HollE and improves generalization on unseen topics in WoW.

pdf bib
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences
Yifan Chen | Qi Zeng | Dilek Hakkani-Tur | Di Jin | Heng Ji | Yun Yang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

pdf bib
Deep Learning for Text Style Transfer: A Survey
Di Jin | Zhijing Jin | Zhiting Hu | Olga Vechtomova | Rada Mihalcea
Computational Linguistics, Volume 48, Issue 1 - March 2022

Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this article, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task.1

pdf bib
DMix: Adaptive Distance-aware Interpolative Mixup
Ramit Sawhney | Megh Thakkar | Shrey Pandit | Ritesh Soun | Di Jin | Diyi Yang | Lucie Flek
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Interpolation-based regularisation methods such as Mixup, which generate virtual training samples, have proven to be effective for various tasks and modalities.We extend Mixup and propose DMix, an adaptive distance-aware interpolative Mixup that selects samples based on their diversity in the embedding space. DMix leverages the hyperbolic space as a similarity measure among input samples for a richer encoded representation.DMix achieves state-of-the-art results on sentence classification over existing data augmentation methods on 8 benchmark datasets across English, Arabic, Turkish, and Hindi languages while achieving benchmark F1 scores in 3 times less number of iterations.We probe the effectiveness of DMix in conjunction with various similarity measures and qualitatively analyze the different components.DMix being generalizable, can be applied to various tasks, models and modalities.

pdf bib
Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention
Yifan Chen | Devamanyu Hazarika | Mahdi Namazifar | Yang Liu | Di Jin | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: NAACL 2022

The massive amount of trainable parameters in the pre-trained language models (PLMs) makes them hard to be deployed to multiple downstream tasks. To address this issue, parameter-efficient transfer learning methods have been proposed to tune only a few parameters during fine-tuning while freezing the rest. This paper looks at existing methods along this line through the kernel lens. Motivated by the connection between self-attention in transformer-based PLMs and kernel learning, we propose kernel-wise adapters, namely Kernel-mix, that utilize the kernel structure in self-attention to guide the assignment of the tunable parameters. These adapters use guidelines found in classical kernel learning and enable separate parameter tuning for each attention head. Our empirical results, over a diverse set of natural language generation and understanding tasks, show that our proposed adapters can attain or improve the strong performance of existing baselines.

2021

pdf bib
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing
Hung-Yi Lee | Mitra Mohtarami | Shang-Wen Li | Di Jin | Mandy Korpusik | Shuyan Dong | Ngoc Thang Vu | Dilek Hakkani-Tur
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

pdf bib
Can I Be of Further Assistance? Using Unstructured Knowledge Access to Improve Task-oriented Conversational Modeling
Di Jin | Seokhwan Kim | Dilek Hakkani-Tur
Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc 2021)

Most prior work on task-oriented dialogue systems are restricted to limited coverage of domain APIs. However, users oftentimes have requests that are out of the scope of these APIs. This work focuses on responding to these beyond-API-coverage user turns by incorporating external, unstructured knowledge sources. Our approach works in a pipelined manner with knowledge-seeking turn detection, knowledge selection, and response generation in sequence. We introduce novel data augmentation methods for the first two steps and demonstrate that the use of information extracted from dialogue context improves the knowledge selection and end-to-end performances. Through experiments, we achieve state-of-the-art performance for both automatic and human evaluation metrics on the DSTC9 Track 1 benchmark dataset, validating the effectiveness of our contributions.

pdf bib
Towards Zero and Few-shot Knowledge-seeking Turn Detection in Task-orientated Dialogue Systems
Di Jin | Shuyang Gao | Seokhwan Kim | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Most prior work on task-oriented dialogue systems is restricted to supporting domain APIs. However, users may have requests that are out of the scope of these APIs. This work focuses on identifying such user requests. Existing methods for this task mainly rely on fine-tuning pre-trained models on large annotated data. We propose a novel method, REDE, based on adaptive representation learning and density estimation. REDE can be applied to zero-shot cases, and quickly learns a high-performing detector with only a few shots by updating less than 3K parameters. We demonstrate REDE’s competitive performance on DSTC9 data and our newly collected test set.

pdf bib
HypMix: Hyperbolic Interpolative Data Augmentation
Ramit Sawhney | Megh Thakkar | Shivam Agarwal | Di Jin | Diyi Yang | Lucie Flek
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Interpolation-based regularisation methods for data augmentation have proven to be effective for various tasks and modalities. These methods involve performing mathematical operations over the raw input samples or their latent states representations - vectors that often possess complex hierarchical geometries. However, these operations are performed in the Euclidean space, simplifying these representations, which may lead to distorted and noisy interpolations. We propose HypMix, a novel model-, data-, and modality-agnostic interpolative data augmentation technique operating in the hyperbolic space, which captures the complex geometry of input and hidden state hierarchies better than its contemporaries. We evaluate HypMix on benchmark and low resource datasets across speech, text, and vision modalities, showing that HypMix consistently outperforms state-of-the-art data augmentation techniques. In addition, we demonstrate the use of HypMix in semi-supervised settings. We further probe into the adversarial robustness and qualitative inferences we draw from HypMix that elucidate the efficacy of the Riemannian hyperbolic manifolds for interpolation-based data augmentation.

2020

pdf bib
Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis
Xiaoyu Xing | Zhijing Jin | Di Jin | Bingning Wang | Qi Zhang | Xuanjing Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text. However, existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects. To solve this problem, we develop a simple but effective approach to enrich ABSA test sets. Specifically, we generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect’s sentiment. Based on the SemEval 2014 dataset, we construct the Aspect Robustness Test Set (ARTS) as a comprehensive probe of the aspect robustness of ABSA models. Over 92% data of ARTS show high fluency and desired sentiment on all aspects by human evaluation. Using ARTS, we analyze the robustness of nine ABSA models, and observe, surprisingly, that their accuracy drops by up to 69.73%. We explore several ways to improve aspect robustness, and find that adversarial training can improve models’ performance on ARTS by up to 32.85%. Our code and new test set are available at https://github.com/zhijing-jin/ARTS_TestSet

pdf bib
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
John Morris | Eli Lifland | Jin Yong Yoo | Jake Grigsby | Di Jin | Yanjun Qi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from four components: a goal function, a set of constraints, a transformation, and a search method. TextAttack’s modular design enables researchers to easily construct attacks from combinations of novel and existing components. TextAttack provides implementations of 16 adversarial attacks from the literature and supports a variety of models and datasets, including BERT and other transformers, and all GLUE tasks. TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness.TextAttack is democratizing NLP: anyone can try data augmentation and adversarial training on any model or dataset, with just a few lines of code. Code and tutorials are available at https://github.com/QData/TextAttack.

pdf bib
Augmenting NLP models using Latent Feature Interpolations
Amit Jindal | Arijit Ghosh Chowdhury | Aniket Didolkar | Di Jin | Ramit Sawhney | Rajiv Ratn Shah
Proceedings of the 28th International Conference on Computational Linguistics

Models with a large number of parameters are prone to over-fitting and often fail to capture the underlying input distribution. We introduce Emix, a data augmentation method that uses interpolations of word embeddings and hidden layer representations to construct virtual examples. We show that Emix shows significant improvements over previously used interpolation based regularizers and data augmentation techniques. We also demonstrate how our proposed method is more robust to sparsification. We highlight the merits of our proposed methodology by performing thorough quantitative and qualitative assessments.

pdf bib
Hooks in the Headline: Learning to Generate Headlines with Controlled Styles
Di Jin | Zhijing Jin | Joey Tianyi Zhou | Lisa Orii | Peter Szolovits
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Current summarization systems only produce plain, factual headlines, far from the practical needs for the exposure and memorableness of the articles. We propose a new task, Stylistic Headline Generation (SHG), to enrich the headlines with three style options (humor, romance and clickbait), thus attracting more readers. With no style-specific article-headline pair (only a standard headline summarization dataset and mono-style corpora), our method TitleStylist generates stylistic headlines by combining the summarization and reconstruction tasks into a multitasking framework. We also introduced a novel parameter sharing scheme to further disentangle the style from text. Through both automatic and human evaluation, we demonstrate that TitleStylist can generate relevant, fluent headlines with three target styles: humor, romance, and clickbait. The attraction score of our model generated headlines outperforms the state-of-the-art summarization model by 9.68%, even outperforming human-written references.

pdf bib
Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering
Ming Yan | Hao Zhang | Di Jin | Joey Tianyi Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multiple-choice question answering (MCQA) is one of the most challenging tasks in machine reading comprehension since it requires more advanced reading comprehension skills such as logical reasoning, summarization, and arithmetic operations. Unfortunately, most existing MCQA datasets are small in size, which increases the difficulty of model learning and generalization. To address this challenge, we propose a multi-source meta transfer (MMT) for low-resource MCQA. In this framework, we first extend meta learning by incorporating multiple training sources to learn a generalized feature representation across domains. To bridge the distribution gap between training sources and the target, we further introduce the meta transfer that can be integrated into the multi-source meta training. More importantly, the proposed MMT is independent of backbone language models. Extensive experiments demonstrate the superiority of MMT over state-of-the-arts, and continuous improvements can be achieved on different backbone networks on both supervised and unsupervised domain adaptation settings.

pdf bib
From Machine Reading Comprehension to Dialogue State Tracking: Bridging the Gap
Shuyang Gao | Sanchit Agarwal | Di Jin | Tagyoung Chung | Dilek Hakkani-Tur
Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI

Dialogue state tracking (DST) is at the heart of task-oriented dialogue systems. However, the scarcity of labeled data is an obstacle to building accurate and robust state tracking systems that work across a variety of domains. Existing approaches generally require some dialogue data with state information and their ability to generalize to unknown domains is limited. In this paper, we propose using machine reading comprehension (RC) in state tracking from two perspectives: model architectures and datasets. We divide the slot types in dialogue state into categorical or extractive to borrow the advantages from both multiple-choice and span-based reading comprehension models. Our method achieves near the current state-of-the-art in joint goal accuracy on MultiWOZ 2.1 given full training data. More importantly, by leveraging machine reading comprehension datasets, our method outperforms the existing approaches by many a large margin in few-shot scenarios when the availability of in-domain data is limited. Lastly, even without any state tracking data, i.e., zero-shot scenario, our proposed approach achieves greater than 90% average slot accuracy in 12 out of 30 slots in MultiWOZ 2.1.

2019

pdf bib
IMaT: Unsupervised Text Attribute Transfer via Iterative Matching and Translation
Zhijing Jin | Di Jin | Jonas Mueller | Nicholas Matthews | Enrico Santus
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Text attribute transfer aims to automatically rewrite sentences such that they possess certain linguistic attributes, while simultaneously preserving their semantic content. This task remains challenging due to a lack of supervised parallel data. Existing approaches try to explicitly disentangle content and attribute information, but this is difficult and often results in poor content-preservation and ungrammaticality. In contrast, we propose a simpler approach, Iterative Matching and Translation (IMaT), which: (1) constructs a pseudo-parallel corpus by aligning a subset of semantically similar sentences from the source and the target corpora; (2) applies a standard sequence-to-sequence model to learn the attribute transfer; (3) iteratively improves the learned transfer function by refining imperfections in the alignment. In sentiment modification and formality transfer tasks, our method outperforms complex state-of-the-art systems by a large margin. As an auxiliary contribution, we produce a publicly-available test set with human-generated transfer references.

pdf bib
Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition
Joey Tianyi Zhou | Hao Zhang | Di Jin | Hongyuan Zhu | Meng Fang | Rick Siow Mong Goh | Kenneth Kwok
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER). Specifically, two variants of DATNet, i.e., DATNet-F and DATNet-P, are investigated to explore effective feature fusion between high and low resource. To address the noisy and imbalanced training data, we propose a novel Generalized Resource-Adversarial Discriminator (GRAD). Additionally, adversarial training is adopted to boost model generalization. In experiments, we examine the effects of different components in DATNet across domains and languages and show that significant improvement can be obtained especially for low-resource data, without augmenting any additional hand-crafted features and pre-trained language model.

2018

pdf bib
MIT-MEDG at SemEval-2018 Task 7: Semantic Relation Classification via Convolution Neural Network
Di Jin | Franck Dernoncourt | Elena Sergeeva | Matthew McDermott | Geeticka Chauhan
Proceedings of The 12th International Workshop on Semantic Evaluation

SemEval 2018 Task 7 tasked participants to build a system to classify two entities within a sentence into one of the 6 possible relation types. We tested 3 classes of models: Linear classifiers, Long Short-Term Memory (LSTM) models, and Convolutional Neural Network (CNN) models. Ultimately, the CNN model class proved most performant, so we specialized to this model for our final submissions. We improved performance beyond a vanilla CNN by including a variant of negative sampling, using custom word embeddings learned over a corpus of ACL articles, training over corpora of both tasks 1.1 and 1.2, using reversed feature, using part of context words beyond the entity pairs and using ensemble methods to improve our final predictions. We also tested attention based pooling, up-sampling, and data augmentation, but none improved performance. Our model achieved rank 6 out of 28 (macro-averaged F1-score: 72.7) in subtask 1.1, and rank 4 out of 20 (macro F1: 80.6) in subtask 1.2.

pdf bib
PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks
Di Jin | Peter Szolovits
Proceedings of the BioNLP 2018 workshop

Successful evidence-based medicine (EBM) applications rely on answering clinical questions by analyzing large medical literature databases. In order to formulate a well-defined, focused clinical question, a framework called PICO is widely used, which identifies the sentences in a given medical text that belong to the four components: Participants/Problem (P), Intervention (I), Comparison (C) and Outcome (O). In this work, we present a Long Short-Term Memory (LSTM) neural network based model to automatically detect PICO elements. By jointly classifying subsequent sentences in the given text, we achieve state-of-the-art results on PICO element classification compared to several strong baseline models. We also make our curated data public as a benchmarking dataset so that the community can benefit from it.

pdf bib
Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts
Di Jin | Peter Szolovits
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Prevalent models based on artificial neural network (ANN) for sentence classification often classify sentences in isolation without considering the context in which sentences appear. This hampers the traditional sentence classification approaches to the problem of sequential sentence classification, where structured prediction is needed for better overall classification performance. In this work, we present a hierarchical sequential labeling network to make use of the contextual information within surrounding sentences to help classify the current sentence. Our model outperforms the state-of-the-art results by 2%-3% on two benchmarking datasets for sequential sentence classification in medical scientific abstracts.

pdf bib
Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning
Fengyu Guo | Ruifang He | Di Jin | Jianwu Dang | Longbiao Wang | Xiangang Li
Proceedings of the 27th International Conference on Computational Linguistics

Implicit discourse relation recognition aims to understand and annotate the latent relations between two discourse arguments, such as temporal, comparison, etc. Most previous methods encode two discourse arguments separately, the ones considering pair specific clues ignore the bidirectional interactions between two arguments and the sparsity of pair patterns. In this paper, we propose a novel neural Tensor network framework with Interactive Attention and Sparse Learning (TIASL) for implicit discourse relation recognition. (1) We mine the most correlated word pairs from two discourse arguments to model pair specific clues, and integrate them as interactive attention into argument representations produced by the bidirectional long short-term memory network. Meanwhile, (2) the neural tensor network with sparse constraint is proposed to explore the deeper and the more important pair patterns so as to fully recognize discourse relations. The experimental results on PDTB show that our proposed TIASL framework is effective.

pdf bib
Interaction-Aware Topic Model for Microblog Conversations through Network Embedding and User Attention
Ruifang He | Xuefei Zhang | Di Jin | Longbiao Wang | Jianwu Dang | Xiangang Li
Proceedings of the 27th International Conference on Computational Linguistics

Traditional topic models are insufficient for topic extraction in social media. The existing methods only consider text information or simultaneously model the posts and the static characteristics of social media. They ignore that one discusses diverse topics when dynamically interacting with different people. Moreover, people who talk about the same topic have different effects on the topic. In this paper, we propose an Interaction-Aware Topic Model (IATM) for microblog conversations by integrating network embedding and user attention. A conversation network linking users based on reposting and replying relationship is constructed to mine the dynamic user behaviours. We model dynamic interactions and user attention so as to learn interaction-aware edge embeddings with social context. Then they are incorporated into neural variational inference for generating the more consistent topics. The experiments on three real-world datasets show that our proposed model is effective.

2017

pdf bib
PurdueNLP at SemEval-2017 Task 1: Predicting Semantic Textual Similarity with Paraphrase and Event Embeddings
I-Ta Lee | Mahak Goindani | Chang Li | Di Jin | Kristen Marie Johnson | Xiao Zhang | Maria Leonor Pacheco | Dan Goldwasser
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our proposed solution for SemEval 2017 Task 1: Semantic Textual Similarity (Daniel Cer and Specia, 2017). The task aims at measuring the degree of equivalence between sentences given in English. Performance is evaluated by computing Pearson Correlation scores between the predicted scores and human judgements. Our proposed system consists of two subsystems and one regression model for predicting STS scores. The two subsystems are designed to learn Paraphrase and Event Embeddings that can take the consideration of paraphrasing characteristics and sentence structures into our system. The regression model associates these embeddings to make the final predictions. The experimental result shows that our system acquires 0.8 of Pearson Correlation Scores in this task.

pdf bib
Leveraging Behavioral and Social Information for Weakly Supervised Collective Classification of Political Discourse on Twitter
Kristen Johnson | Di Jin | Dan Goldwasser
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Framing is a political strategy in which politicians carefully word their statements in order to control public perception of issues. Previous works exploring political framing typically analyze frame usage in longer texts, such as congressional speeches. We present a collection of weakly supervised models which harness collective classification to predict the frames used in political discourse on the microblogging platform, Twitter. Our global probabilistic models show that by combining both lexical features of tweets and network-based behavioral features of Twitter, we are able to increase the average, unsupervised F1 score by 21.52 points over a lexical baseline alone.

2016

pdf bib
Adapting Event Embedding for Implicit Discourse Relation Recognition
Maria Leonor Pacheco | I-Ta Lee | Xiao Zhang | Abdullah Khan Zehady | Pranjal Daga | Di Jin | Ayush Parolia | Dan Goldwasser
Proceedings of the CoNLL-16 shared task