Diyi Yang


2022

pdf bib
SUBS: Subtree Substitution for Compositional Semantic Parsing
Jingfeng Yang | Le Zhang | Diyi Yang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Although sequence-to-sequence models often achieve good performance in semantic parsing for i.i.d. data, their performance is still inferior in compositional generalization. Several data augmentation methods have been proposed to alleviate this problem. However, prior work only leveraged superficial grammar or rules for data augmentation, which resulted in limited improvement. We propose to use subtree substitution for compositional data augmentation, where we consider subtrees with similar semantic functions as exchangeable. Our experiments showed that such augmented data led to significantly better performance on Scan and GeoQuery, and reached new SOTA on compositional split of GeoQuery.

pdf bib
Explaining Toxic Text via Knowledge Enhanced Text Generation
Rohit Sridhar | Diyi Yang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Warning: This paper contains content that is offensive and may be upsetting.Biased or toxic speech can be harmful to various demographic groups. Therefore, it is not only important for models to detect these speech, but to also output explanations of why a given text is toxic. Previous literature has mostly focused on classifying and detecting toxic speech, and existing efforts on explaining stereotypes in toxic speech mainly use standard text generation approaches, resulting in generic and repetitive explanations. Building on these prior works, we introduce a novel knowledge-informed encoder-decoder framework to utilize multiple knowledge sources to generate implications of biased text.Experiments show that our knowledge informed models outperform prior state-of-the-art models significantly, and can generate detailed explanations of stereotypes in toxic speech compared to baselines, both quantitatively and qualitatively.

pdf bib
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang | Haohan Wang | Diyi Yang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

As NLP models achieved state-of-the-art performances over benchmarks and gained wide applications, it has been increasingly important to ensure the safe deployment of these models in the real world, e.g., making sure the models are robust against unseen or challenging scenarios. Despite robustness being an increasingly studied topic, it has been separately explored in applications like vision and NLP, with various definitions, evaluation and mitigation strategies in multiple lines of research. In this paper, we aim to provide a unifying survey of how to define, measure and improve robustness in NLP. We first connect multiple definitions of robustness, then unify various lines of work on identifying robustness failures and evaluating models’ robustness. Correspondingly, we present mitigation strategies that are data-driven, model-driven, and inductive-prior-based, with a more systematic view of how to effectively improve robustness in NLP models. Finally, we conclude by outlining open challenges and future directions to motivate further research in this area.

pdf bib
TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding
Le Zhang | Zichao Yang | Diyi Yang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Data augmentation is an effective approach to tackle over-fitting. Many previous works have proposed different data augmentations strategies for NLP, such as noise injection, word replacement, back-translation etc. Though effective, they missed one important characteristic of language–compositionality, meaning of a complex expression is built from its sub-parts. Motivated by this, we propose a compositional data augmentation approach for natural language understanding called TreeMix. Specifically, TreeMix leverages constituency parsing tree to decompose sentences into constituent sub-structures and the Mixup data augmentation technique to recombine them to generate new sentences. Compared with previous approaches, TreeMix introduces greater diversity to the samples generated and encourages models to learn compositionality of NLP data. Extensive experiments on text classification and SCAN demonstrate that TreeMix outperforms current state-of-the-art data augmentation methods.

pdf bib
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension
Ying Xu | Dakuo Wang | Mo Yu | Daniel Ritchie | Bingsheng Yao | Tongshuang Wu | Zheng Zhang | Toby Li | Nora Bradford | Branda Sun | Tran Hoang | Yisi Sang | Yufang Hou | Xiaojuan Ma | Diyi Yang | Nanyun Peng | Zhou Yu | Mark Warschauer
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative comprehension of kindergarten to eighth-grade students. Generated by educational experts based on an evidence-based theoretical framework, FairytaleQA consists of 10,580 explicit and implicit questions derived from 278 children-friendly stories, covering seven types of narrative elements or relations. Our dataset is valuable in two folds: First, we ran existing QA models on our dataset and confirmed that this annotation helps assess models’ fine-grained learning skills. Second, the dataset supports question generation (QG) task in the education domain. Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.

pdf bib
Continual Sequence Generation with Adaptive Compositional Modules
Yanzhe Zhang | Xuezhi Wang | Diyi Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Continual learning is essential for real-world deployment when there is a need to quickly adapt the model to new tasks without forgetting knowledge of old tasks. Existing work on continual sequence generation either always reuses existing parameters to learn new tasks, which is vulnerable to catastrophic forgetting on dissimilar tasks, or blindly adds new parameters for every new task, which could prevent knowledge sharing between similar tasks. To get the best of both worlds, in this work, we propose continual sequence generation with adaptive compositional modules to adaptively add modules in transformer architectures and compose both old and new modules for new tasks. We also incorporate pseudo experience replay to facilitate knowledge transfer in those shared modules. Experiment results on various sequences of generation tasks show that our framework can adaptively add modules or reuse modules based on task similarity, outperforming state-of-the-art baselines in terms of both performance and parameter efficiency. We make our code public at https://github.com/GT-SALT/Adaptive-Compositional-Modules.

pdf bib
Inducing Positive Perspectives with Text Reframing
Caleb Ziems | Minzhi Li | Anthony Zhang | Diyi Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sentiment transfer is one popular example of a text style transfer task, where the goal is to reverse the sentiment polarity of a text. With a sentiment reversal comes also a reversal in meaning. We introduce a different but related task called positive reframing in which we neutralize a negative point of view and generate a more positive perspective for the author without contradicting the original meaning. Our insistence on meaning preservation makes positive reframing a challenging and semantically rich task. To facilitate rapid progress, we introduce a large-scale benchmark, Positive Psychology Frames, with 8,349 sentence pairs and 12,755 structured annotations to explain positive reframing in terms of six theoretically-motivated reframing strategies. Then we evaluate a set of state-of-the-art text style transfer models, and conclude by discussing key challenges and directions for future work.

pdf bib
VALUE: Understanding Dialect Disparity in NLU
Caleb Ziems | Jiaao Chen | Camille Harris | Jessica Anderson | Diyi Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance.

pdf bib
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
Caleb Ziems | Jane Yu | Yi-Chia Wang | Alon Halevy | Diyi Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Conversational agents have come increasingly closer to human competence in open-domain dialogue settings; however, such models can reflect insensitive, hurtful, or entirely incoherent viewpoints that erode a user’s trust in the moral integrity of the system. Moral deviations are difficult to mitigate because moral judgments are not universal, and there may be multiple competing judgments that apply to a situation simultaneously. In this work, we introduce a new resource, not to authoritatively resolve moral ambiguities, but instead to facilitate systematic understanding of the intuitions, values and moral judgments reflected in the utterances of dialogue systems. The Moral Integrity Corpus, MIC, is such a resource, which captures the moral assumptions of 38k prompt-reply pairs, using 99k distinct Rules of Thumb (RoTs). Each RoT reflects a particular moral conviction that can explain why a chatbot’s reply may appear acceptable or problematic. We further organize RoTs with a set of 9 moral and social attributes and benchmark performance for attribute classification. Most importantly, we show that current neural language models can automatically generate new RoTs that reasonably describe previously unseen interactions, but they still struggle with certain scenarios. Our findings suggest that MIC will be a useful resource for understanding and language models’ implicit moral assumptions and flexibly benchmarking the integrity of conversational agents. To download the data, see https://github.com/GT-SALT/mic

pdf bib
DMix: Adaptive Distance-aware Interpolative Mixup
Ramit Sawhney | Megh Thakkar | Shrey Pandit | Ritesh Soun | Di Jin | Diyi Yang | Lucie Flek
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Interpolation-based regularisation methods such as Mixup, which generate virtual training samples, have proven to be effective for various tasks and modalities.We extend Mixup and propose DMix, an adaptive distance-aware interpolative Mixup that selects samples based on their diversity in the embedding space. DMix leverages the hyperbolic space as a similarity measure among input samples for a richer encoded representation.DMix achieves state-of-the-art results on sentence classification over existing data augmentation methods on 8 benchmark datasets across English, Arabic, Turkish, and Hindi languages while achieving benchmark F1 scores in 3 times less number of iterations.We probe the effectiveness of DMix in conjunction with various similarity measures and qualitatively analyze the different components.DMix being generalizable, can be applied to various tasks, models and modalities.

pdf bib
Learning with Limited Text Data
Diyi Yang | Ankur Parikh | Colin Raffel
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Natural Language Processing (NLP) has achieved great progress in the past decade on the basis of neural models, which often make use of large amounts of labeled data to achieve state-of-the-art performance. The dependence on labeled data prevents NLP models from being applied to low-resource settings and languages because of the time, money, and expertise that is often required to label massive amounts of textual data. Consequently, the ability to learn with limited labeled data is crucial for deploying neural systems to real-world NLP applications. Recently, numerous approaches have been explored to alleviate the need for labeled data in NLP such as data augmentation and semi-supervised learning. This tutorial aims to provide a systematic and up-to-date overview of these methods in order to help researchers and practitioners understand the landscape of approaches and the challenges associated with learning from limited labeled data, an emerging topic in the computational linguistics community. We will consider applications to a wide variety of NLP tasks (including text classification, generation, and structured prediction) and will highlight current challenges and future directions.

pdf bib
Leveraging Expert Guided Adversarial Augmentation For Improving Generalization in Named Entity Recognition
Aaron Reich | Jiaao Chen | Aastha Agrawal | Yanzhe Zhang | Diyi Yang
Findings of the Association for Computational Linguistics: ACL 2022

Named Entity Recognition (NER) systems often demonstrate great performance on in-distribution data, but perform poorly on examples drawn from a shifted distribution. One way to evaluate the generalization ability of NER models is to use adversarial examples, on which the specific variations associated with named entities are rarely considered. To this end, we propose leveraging expert-guided heuristics to change the entity tokens and their surrounding contexts thereby altering their entity types as adversarial attacks. Using expert-guided heuristics, we augmented the CoNLL 2003 test set and manually annotated it to construct a high-quality challenging set. We found that state-of-the-art NER systems trained on CoNLL 2003 training data drop performance dramatically on our challenging set. By training on adversarial augmented training examples and using mixup for regularization, we were able to significantly improve the performance on the challenging set as well as improve out-of-domain generalization which we evaluated by using OntoNotes data. We have publicly released our dataset and code at https://github.com/GT-SALT/Guided-Adversarial-Augmentation.

pdf bib
Focus on the Action: Learning to Highlight and Summarize Jointly for Email To-Do Items Summarization
Kexun Zhang | Jiaao Chen | Diyi Yang
Findings of the Association for Computational Linguistics: ACL 2022

Automatic email to-do item generation is the task of generating to-do items from a given email to help people overview emails and schedule daily work. Different from prior research on email summarization, to-do item generation focuses on generating action mentions to provide more structured summaries of email text.Prior work either requires large amount of annotation for key sentences with potential actions or fails to pay attention to nuanced actions from these unstructured emails, and thus often lead to unfaithful summaries. To fill these gaps, we propose a simple and effective learning to highlight and summarize framework (LHS) to learn to identify the most salient text and actions, and incorporate these structured representations to generate more faithful to-do items. Experiments show that our LHS model outperforms the baselines and achieves the state-of-the-art performance in terms of both quantitative evaluation and human judgement. We also discussed specific challenges that current models faced with email to-do summarization.

pdf bib
SEQZERO: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models
Jingfeng Yang | Haoming Jiang | Qingyu Yin | Danqing Zhang | Bing Yin | Diyi Yang
Findings of the Association for Computational Linguistics: NAACL 2022

Recent research showed promising results on combining pretrained language models (LMs) with canonical utterance for few-shot semantic parsing.The canonical utterance is often lengthy and complex due to the compositional structure of formal languages. Learning to generate such canonical utterance requires significant amount of data to reach high performance. Fine-tuning with only few-shot samples, the LMs can easily forget pretrained knowledge, overfit spurious biases, and suffer from compositionally out-of-distribution generalization errors. To tackle these issues, we propose a novel few-shot semantic parsing method – SEQZERO. SEQZERO decomposes the problem into a sequence of sub-problems, which corresponds to the sub-clauses of the formal language. Based on the decomposition, the LMs only need to generate short answers using prompts for predicting sub-clauses. Thus, SEQZERO avoids generating a long canonical utterance at once. Moreover, SEQZERO employs not only a few-shot model but also a zero-shot model to alleviate the overfitting.In particular, SEQZERO brings out the merits from both models via ensemble equipped with our proposed constrained rescaling.SEQZERO achieves SOTA performance of BART-based models on GeoQuery and EcommerceQuery, which are two few-shot datasets with compositional data split.

pdf bib
Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models
Tianlu Wang | Rohit Sridhar | Diyi Yang | Xuezhi Wang
Findings of the Association for Computational Linguistics: NAACL 2022

Recently, NLP models have achieved remarkable progress across a variety of tasks; however, they have also been criticized for being not robust. Many robustness problems can be attributed to models exploiting “spurious correlations”, or “shortcuts” between the training data and the task labels. Most existing work identifies a limited set of task-specific shortcuts via human priors or error analyses, which requires extensive expertise and efforts. In this paper, we aim to automatically identify such spurious correlations in NLP models at scale. We first leverage existing interpretability methods to extract tokens that significantly affect model’s decision process from the input text. We then distinguish “genuine” tokens and “spurious” tokens by analyzing model predictions across multiple corpora and further verify them through knowledge-aware perturbations. We show that our proposed method can effectively and efficiently identify a scalable set of “shortcuts”, and mitigating these leads to more robust models in multiple applications.

2021

pdf bib
Semantic Categorization of Social Knowledge for Commonsense Question Answering
Gengyu Wang | Xiaochen Hou | Diyi Yang | Kathleen McKeown | Jing Huang
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing

Large pre-trained language models (PLMs) have led to great success on various commonsense question answering (QA) tasks in an end-to-end fashion. However, little attention has been paid to what commonsense knowledge is needed to deeply characterize these QA tasks. In this work, we proposed to categorize the semantics needed for these tasks using the SocialIQA as an example. Building upon our labeled social knowledge categories dataset on top of SocialIQA, we further train neural QA models to incorporate such social knowledge categories and relation information from a knowledge base. Unlike previous work, we observe our models with semantic categorizations of social knowledge can achieve comparable performance with a relatively simple model and smaller size compared to other complex approaches.

pdf bib
Proceedings of the First Workshop on Causal Inference and NLP
Amir Feder | Katherine Keith | Emaad Manzoor | Reid Pryzant | Dhanya Sridhar | Zach Wood-Doughty | Jacob Eisenstein | Justin Grimmer | Roi Reichart | Molly Roberts | Uri Shalit | Brandon Stewart | Victor Veitch | Diyi Yang
Proceedings of the First Workshop on Causal Inference and NLP

pdf bib
Putting Humans in the Natural Language Processing Loop: A Survey
Zijie J. Wang | Dongjin Choi | Shenyu Xu | Diyi Yang
Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing

How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious—solving various NLP problems, collecting diverse feedback from different people, and applying different methods to learn from human feedback. We present a survey of HITL NLP work from both Machine Learning (ML) and Human-computer Interaction (HCI) communities that highlights its short yet inspiring history, and thoroughly summarize recent frameworks focusing on their tasks, goals, human interactions, and feedback learning methods. Finally, we discuss future studies for integrating human feedback in the NLP development loop.

pdf bib
HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalizability
Jiaao Chen | Dinghan Shen | Weizhu Chen | Diyi Yang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Fine-tuning large pre-trained models with task-specific data has achieved great success in NLP. However, it has been demonstrated that the majority of information within the self-attention networks is redundant and not utilized effectively during the fine-tuning stage. This leads to inferior results when generalizing the obtained models to out-of-domain distributions. To this end, we propose a simple yet effective data augmentation technique, HiddenCut, to better regularize the model and encourage it to learn more generalizable features. Specifically, contiguous spans within the hidden space are dynamically and strategically dropped during training. Experiments show that our HiddenCut method outperforms the state-of-the-art augmentation methods on the GLUE benchmark, and consistently exhibits superior generalization performances on out-of-distribution and challenging counterexamples. We have publicly released our code at https://github.com/GT-SALT/HiddenCut.

pdf bib
Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering
Aditya Gupta | Jiacheng Xu | Shyam Upadhyay | Diyi Yang | Manaal Faruqui
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
To Protect and To Serve? Analyzing Entity-Centric Framing of Police Violence
Caleb Ziems | Diyi Yang
Findings of the Association for Computational Linguistics: EMNLP 2021

Framing has significant but subtle effects on public opinion and policy. We propose an NLP framework to measure entity-centric frames. We use it to understand media coverage on police violence in the United States in a new Police Violence Frames Corpus of 82k news articles spanning 7k police killings. Our work uncovers more than a dozen framing devices and reveals significant differences in the way liberal and conservative news sources frame both the issue of police violence and the entities involved. Conservative sources emphasize when the victim is armed or attacking an officer and are more likely to mention the victim’s criminal record. Liberal sources focus more on the underlying systemic injustice, highlighting the victim’s race and that they were unarmed. We discover temporary spikes in these injustice frames near high-profile shooting events, and finally, we show protest volume correlates with and precedes media framing decisions.

pdf bib
WIKIBIAS: Detecting Multi-Span Subjective Biases in Language
Yang Zhong | Jingfeng Yang | Wei Xu | Diyi Yang
Findings of the Association for Computational Linguistics: EMNLP 2021

Biases continue to be prevalent in modern text and media, especially subjective bias – a special type of bias that introduces improper attitudes or presents a statement with the presupposition of truth. To tackle the problem of detecting and further mitigating subjective bias, we introduce a manually annotated parallel corpus WIKIBIAS with more than 4,000 sentence pairs from Wikipedia edits. This corpus contains annotations towards both sentence-level bias types and token-level biased segments. We present systematic analyses of our dataset and results achieved by a set of state-of-the-art baselines in terms of three tasks: bias classification, tagging biased segments, and neutralizing biased text. We find that current models still struggle with detecting multi-span biases despite their reasonable performances, suggesting that our dataset can serve as a useful research benchmark. We also demonstrate that models trained on our dataset can generalize well to multiple domains such as news and political speeches.

pdf bib
The First Workshop on Evaluations and Assessments of Neural Conversation Systems
Wei Wei | Bo Dai | Tuo Zhao | Lihong Li | Diyi Yang | Yun-Nung Chen | Y-Lan Boureau | Asli Celikyilmaz | Alborz Geramifard | Aman Ahuja | Haoming Jiang
The First Workshop on Evaluations and Assessments of Neural Conversation Systems

pdf bib
Tuiteamos o pongamos un tuit? Investigating the Social Constraints of Loanword Integration in Spanish Social Media
Ian Stewart | Diyi Yang | Jacob Eisenstein
Proceedings of the Society for Computation in Linguistics 2021

pdf bib
The Importance of Modeling Social Factors of Language: Theory and Practice
Dirk Hovy | Diyi Yang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Natural language processing (NLP) applications are now more powerful and ubiquitous than ever before. With rapidly developing (neural) models and ever-more available data, current NLP models have access to more information than any human speaker during their life. Still, it would be hard to argue that NLP models have reached human-level capacity. In this position paper, we argue that the reason for the current limitations is a focus on information content while ignoring language’s social factors. We show that current NLP systems systematically break down when faced with interpreting the social factors of language. This limits applications to a subset of information-related tasks and prevents NLP from reaching human-level performance. At the same time, systems that incorporate even a minimum of social factors already show remarkable improvements. We formalize a taxonomy of seven social factors based on linguistic theory and exemplify current failures and emerging successes for each of them. We suggest that the NLP community address social factors to get closer to the goal of human-like language understanding.

pdf bib
Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs
Jiaao Chen | Diyi Yang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Abstractive conversation summarization has received much attention recently. However, these generated summaries often suffer from insufficient, redundant, or incorrect content, largely due to the unstructured and complex characteristics of human-human interactions. To this end, we propose to explicitly model the rich structures in conversations for more precise and accurate conversation summarization, by first incorporating discourse relations between utterances and action triples (“who-doing-what”) in utterances through structured graphs to better encode conversations, and then designing a multi-granularity decoder to generate summaries by combining all levels of information. Experiments show that our proposed models outperform state-of-the-art methods and generalize well in other domains in terms of both automatic evaluations and human judgments. We have publicly released our code at https://github.com/GT-SALT/Structure-Aware-BART.

pdf bib
Personalized Response Generation via Generative Split Memory Network
Yuwei Wu | Xuezhe Ma | Diyi Yang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Despite the impressive successes of generation and dialogue systems, how to endow a text generation system with particular personality traits to deliver more personalized responses remains under-investigated. In this work, we look at how to generate personalized responses for questions on Reddit by utilizing personalized user profiles and posting histories. Specifically, we release an open-domain single-turn dialog dataset made up of 1.5M conversation pairs together with 300k profiles of users and related comments. We then propose a memory network to generate personalized responses in dialogue that utilizes a novel mechanism of splitting memories: one for user profile meta attributes and the other for user-generated information like comment histories. Experimental results show the quantitative and qualitative improvements of our simple split memory network model over the state-of-the-art response generation baselines.

pdf bib
Continual Learning for Text Classification with Information Disentanglement Based Regularization
Yufan Huang | Yanzhe Zhang | Jiaao Chen | Xuezhi Wang | Diyi Yang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time. Previous continual learning methods are mainly designed to preserve knowledge from previous tasks, without much emphasis on how to well generalize models to new tasks. In this work, we propose an information disentanglement based regularization method for continual learning on text classification. Our proposed method first disentangles text hidden spaces into representations that are generic to all tasks and representations specific to each individual task, and further regularizes these representations differently to better constrain the knowledge required to generalize. We also introduce two simple auxiliary tasks: next sentence prediction and task-id prediction, for learning better generic and specific representation spaces. Experiments conducted on large-scale benchmarks demonstrate the effectiveness of our method in continual text classification tasks with various sequences and lengths over state-of-the-art baselines. We have publicly released our code at https://github.com/GT-SALT/IDBR.

pdf bib
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai ElSherief | Caleb Ziems | David Muchlinski | Vaishnavi Anupindi | Jordyn Seybolt | Munmun De Choudhury | Diyi Yang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate speech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message and its implication. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech, and we discuss key features that challenge existing models. This dataset will continue to serve as a useful benchmark for understanding this multifaceted issue.

pdf bib
Frustratingly Simple but Surprisingly Strong: Using Language-Independent Features for Zero-shot Cross-lingual Semantic Parsing
Jingfeng Yang | Federico Fancellu | Bonnie Webber | Diyi Yang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The availability of corpora has led to significant advances in training semantic parsers in English. Unfortunately, for languages other than English, annotated data is limited and so is the performance of the developed parsers. Recently, pretrained multilingual models have been proven useful for zero-shot cross-lingual transfer in many NLP tasks. What else does it require to apply a parser trained in English to other languages for zero-shot cross-lingual semantic parsing? Will simple language-independent features help? To this end, we experiment with six Discourse Representation Structure (DRS) semantic parsers in English, and generalize them to Italian, German and Dutch, where there are only a small number of manually annotated parses available. Extensive experiments show that despite its simplicity, adding Universal Dependency (UD) relations and Universal POS tags (UPOS) as model-agnostic features achieves surprisingly strong improvement on all parsers.

pdf bib
Simple Conversational Data Augmentation for Semi-supervised Abstractive Dialogue Summarization
Jiaao Chen | Diyi Yang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Abstractive conversation summarization has received growing attention while most current state-of-the-art summarization models heavily rely on human-annotated summaries. To reduce the dependence on labeled summaries, in this work, we present a simple yet effective set of Conversational Data Augmentation (CODA) methods for semi-supervised abstractive conversation summarization, such as random swapping/deletion to perturb the discourse relations inside conversations, dialogue-acts-guided insertion to interrupt the development of conversations, and conditional-generation-based substitution to substitute utterances with their paraphrases generated based on the conversation context. To further utilize unlabeled conversations, we combine CODA with two-stage noisy self-training where we first pre-train the summarization model on unlabeled conversations with pseudo summaries and then fine-tune it on labeled conversations. Experiments conducted on the recent conversation summarization datasets demonstrate the effectiveness of our methods over several state-of-the-art data augmentation baselines.

pdf bib
HypMix: Hyperbolic Interpolative Data Augmentation
Ramit Sawhney | Megh Thakkar | Shivam Agarwal | Di Jin | Diyi Yang | Lucie Flek
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Interpolation-based regularisation methods for data augmentation have proven to be effective for various tasks and modalities. These methods involve performing mathematical operations over the raw input samples or their latent states representations - vectors that often possess complex hierarchical geometries. However, these operations are performed in the Euclidean space, simplifying these representations, which may lead to distorted and noisy interpolations. We propose HypMix, a novel model-, data-, and modality-agnostic interpolative data augmentation technique operating in the hyperbolic space, which captures the complex geometry of input and hidden state hierarchies better than its contemporaries. We evaluate HypMix on benchmark and low resource datasets across speech, text, and vision modalities, showing that HypMix consistently outperforms state-of-the-art data augmentation techniques. In addition, we demonstrate the use of HypMix in semi-supervised settings. We further probe into the adversarial robustness and qualitative inferences we draw from HypMix that elucidate the efficacy of the Riemannian hyperbolic manifolds for interpolation-based data augmentation.

pdf bib
Personalized Response Generation with Tensor Factorization
Zhenghui Wang | Lingxiao Luo | Diyi Yang
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

Personalized response generation is essential for more human-like conversations. However, how to model user personalization information with no explicit user persona descriptions or demographics still remains under-investigated. To tackle the data sparsity problem and the huge number of users, we utilize tensor factorization to model users’ personalization information with their posting histories. Specifically, we introduce the personalized response embedding for all question-user pairs and form them into a three-mode tensor, decomposed by Tucker decomposition. The personalized response embedding is fed to either the decoder of an LSTM-based Seq2Seq model or a transformer language model to help generate more personalized responses. To evaluate how personalized the generated responses are, we further propose a novel ranking-based metric called Per-Hits@k which measures how likely are the generated responses come from the corresponding users. Results on a large-scale conversation dataset show that our proposed tensor factorization based models generate more personalized and higher quality responses compared to baselines.

pdf bib
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann | Tosin Adewumi | Karmanya Aggarwal | Pawan Sasanka Ammanamanchi | Anuoluwapo Aremu | Antoine Bosselut | Khyathi Raghavi Chandu | Miruna-Adriana Clinciu | Dipanjan Das | Kaustubh Dhole | Wanyu Du | Esin Durmus | Ondřej Dušek | Chris Chinenye Emezue | Varun Gangal | Cristina Garbacea | Tatsunori Hashimoto | Yufang Hou | Yacine Jernite | Harsh Jhamtani | Yangfeng Ji | Shailza Jolly | Mihir Kale | Dhruv Kumar | Faisal Ladhak | Aman Madaan | Mounica Maddela | Khyati Mahajan | Saad Mahamood | Bodhisattwa Prasad Majumder | Pedro Henrique Martins | Angelina McMillan-Major | Simon Mille | Emiel van Miltenburg | Moin Nadeem | Shashi Narayan | Vitaly Nikolaev | Andre Niyongabo Rubungo | Salomey Osei | Ankur Parikh | Laura Perez-Beltrachini | Niranjan Ramesh Rao | Vikas Raunak | Juan Diego Rodriguez | Sashank Santhanam | João Sedoc | Thibault Sellam | Samira Shaikh | Anastasia Shimorina | Marco Antonio Sobrevilla Cabezudo | Hendrik Strobelt | Nishant Subramani | Wei Xu | Diyi Yang | Akhila Yerukola | Jiawei Zhou
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.

2020

pdf bib
ToTTo: A Controlled Table-To-Text Generation Dataset
Ankur Parikh | Xuezhi Wang | Sebastian Gehrmann | Manaal Faruqui | Bhuwan Dhingra | Diyi Yang | Dipanjan Das
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.

pdf bib
Local Additivity Based Data Augmentation for Semi-supervised NER
Jiaao Chen | Zhenghui Wang | Ran Tian | Zichao Yang | Diyi Yang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data. In this work, to alleviate the dependence on labeled data, we propose a Local Additivity based Data Augmentation (LADA) method for semi-supervised NER, in which we create virtual samples by interpolating sequences close to each other. Our approach has two variations: Intra-LADA and Inter-LADA, where Intra-LADA performs interpolations among tokens within one sentence, and Inter-LADA samples different sentences to interpolate. Through linear additions between sampled training data, LADA creates an infinite amount of labeled data and improves both entity and context learning. We further extend LADA to the semi-supervised setting by designing a novel consistency loss for unlabeled data. Experiments conducted on two NER benchmarks demonstrate the effectiveness of our methods over several strong baselines. We have publicly released our code at https://github.com/GT-SALT/LADA

pdf bib
Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection
Jingfeng Yang | Diyi Yang | Zhaoran Ma
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing approaches to disfluency detection heavily depend on human-annotated data. Numbers of data augmentation methods have been proposed to alleviate the dependence on labeled data. However, current augmentation approaches such as random insertion or repetition fail to resemble training corpus well and usually resulted in unnatural and limited types of disfluencies. In this work, we propose a simple Planner-Generator based disfluency generation model to generate natural and diverse disfluent texts as augmented data, where the Planner decides on where to insert disfluent segments and the Generator follows the prediction to generate corresponding disfluent segments. We further utilize this augmented data for pretraining and leverage it for the task of disfluency detection. Experiments demonstrated that our two-stage disfluency generation model outperforms existing baselines; those disfluent sentences generated significantly aided the task of disfluency detection and led to state-of-the-art performance on Switchboard corpus.

pdf bib
Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization
Jiaao Chen | Diyi Yang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Text summarization is one of the most challenging and interesting problems in NLP. Although much attention has been paid to summarizing structured text like news reports or encyclopedia articles, summarizing conversations—an essential part of human-human/machine interaction where most important pieces of information are scattered across various utterances of different speakers—remains relatively under-investigated. This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations and then utilizing a multi-view decoder to incorporate different views to generate dialogue summaries. Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment. We also discussed specific challenges that current approaches faced with this task. We have publicly released our code at https://github.com/GT-SALT/Multi-View-Seq2Seq.

pdf bib
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification
Jiaao Chen | Zichao Yang | Diyi Yang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. Moreover, we leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data, hence making them as easy to use as labeled data. By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks. The improvement is especially prominent when supervision is extremely limited. We have publicly released our code at https://github.com/GT-SALT/MixText.

pdf bib
Examining the Ordering of Rhetorical Strategies in Persuasive Requests
Omar Shaikh | Jiaao Chen | Jon Saad-Falcon | Polo Chau | Diyi Yang
Findings of the Association for Computational Linguistics: EMNLP 2020

Interpreting how persuasive language influences audiences has implications across many domains like advertising, argumentation, and propaganda. Persuasion relies on more than a message’s content. Arranging the order of the message itself (i.e., ordering specific rhetorical strategies) also plays an important role. To examine how strategy orderings contribute to persuasiveness, we first utilize a Variational Autoencoder model to disentangle content and rhetorical strategies in textual requests from a large-scale loan request corpus. We then visualize interplay between content and strategy through an attentional LSTM that predicts the success of textual requests. We find that specific (orderings of) strategies interact uniquely with a request’s content to impact success rate, and thus the persuasiveness of a request.

pdf bib
Semi-supervised Formality Style Transfer using Language Model Discriminator and Mutual Information Maximization
Kunal Chawla | Diyi Yang
Findings of the Association for Computational Linguistics: EMNLP 2020

Formality style transfer is the task of converting informal sentences to grammatically-correct formal sentences, which can be used to improve performance of many downstream NLP tasks. In this work, we propose a semi-supervised formality style transfer model that utilizes a language model-based discriminator to maximize the likelihood of the output sentence being formal, which allows us to use maximization of token-level conditional probabilities for training. We further propose to maximize mutual information between source and target styles as our training objective instead of maximizing the regular likelihood that often leads to repetitive and trivial generated responses. Experiments showed that our model outperformed previous state-of-the-art baselines significantly in terms of both automated metrics and human judgement. We further generalized our model to unsupervised text style transfer task, and achieved significant improvements on two benchmark sentiment style transfer datasets.

pdf bib
“This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation
Stephanie Schoch | Diyi Yang | Yangfeng Ji
Proceedings of the 1st Workshop on Evaluating NLG Evaluation

Despite recent efforts reviewing current human evaluation practices for natural language generation (NLG) research, the lack of reported question wording and potential for framing effects or cognitive biases influencing results has been widely overlooked. In this opinion paper, we detail three possible framing effects and cognitive biases that could be imposed on human evaluation in NLG. Based on this, we make a call for increased transparency for human evaluation in NLG and propose the concept of human evaluation statements. We make several recommendations for design details to report that could potentially influence results, such as question wording, and suggest that reporting pertinent design details can help increase comparability across studies as well as reproducibility of results.

2019

pdf bib
Let’s Make Your Request More Persuasive: Modeling Persuasive Strategies via Semi-Supervised Neural Nets on Crowdfunding Platforms
Diyi Yang | Jiaao Chen | Zichao Yang | Dan Jurafsky | Eduard Hovy
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Modeling what makes a request persuasive - eliciting the desired response from a reader - is critical to the study of propaganda, behavioral economics, and advertising. Yet current models can’t quantify the persuasiveness of requests or extract successful persuasive strategies. Building on theories of persuasion, we propose a neural network to quantify persuasiveness and identify the persuasive strategies in advocacy requests. Our semi-supervised hierarchical neural network model is supervised by the number of people persuaded to take actions and partially supervised at the sentence level with human-labeled rhetorical strategies. Our method outperforms several baselines, uncovers persuasive strategies - offering increased interpretability of persuasive speech - and has applications for other situations with document-level supervision but only partial sentence supervision.

pdf bib
Proceedings of the 2019 Workshop on Widening NLP
Amittai Axelrod | Diyi Yang | Rossana Cunha | Samira Shaikh | Zeerak Waseem
Proceedings of the 2019 Workshop on Widening NLP

2017

pdf bib
Identifying Semantic Edit Intentions from Revisions in Wikipedia
Diyi Yang | Aaron Halfaker | Robert Kraut | Eduard Hovy
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Most studies on human editing focus merely on syntactic revision operations, failing to capture the intentions behind revision changes, which are essential for facilitating the single and collaborative writing process. In this work, we develop in collaboration with Wikipedia editors a 13-category taxonomy of the semantic intention behind edits in Wikipedia articles. Using labeled article edits, we build a computational classifier of intentions that achieved a micro-averaged F1 score of 0.621. We use this model to investigate edit intention effectiveness: how different types of edits predict the retention of newcomers and changes in the quality of articles, two key concerns for Wikipedia today. Our analysis shows that the types of edits that users make in their first session predict their subsequent survival as Wikipedia editors, and articles in different stages need different types of edits.

2016

pdf bib
Hierarchical Attention Networks for Document Classification
Zichao Yang | Diyi Yang | Chris Dyer | Xiaodong He | Alex Smola | Eduard Hovy
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Edit Categories and Editor Role Identification in Wikipedia
Diyi Yang | Aaron Halfaker | Robert Kraut | Eduard Hovy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this work, we introduced a corpus for categorizing edit types in Wikipedia. This fine-grained taxonomy of edit types enables us to differentiate editing actions and find editor roles in Wikipedia based on their low-level edit types. To do this, we first created an annotated corpus based on 1,996 edits obtained from 953 article revisions and built machine-learning models to automatically identify the edit categories associated with edits. Building on this automated measurement of edit types, we then applied a graphical model analogous to Latent Dirichlet Allocation to uncover the latent roles in editors’ edit histories. Applying this technique revealed eight different roles editors play, such as Social Networker, Substantive Expert, etc.

2015

pdf bib
Humor Recognition and Humor Anchor Extraction
Diyi Yang | Alon Lavie | Chris Dyer | Eduard Hovy
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets
William Yang Wang | Diyi Yang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Incorporating Word Correlation Knowledge into Topic Modeling
Pengtao Xie | Diyi Yang | Eric Xing
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Weakly Supervised Role Identification in Teamwork Interactions
Diyi Yang | Miaomiao Wen | Carolyn Rosé
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Towards Identifying the Resolvability of Threads in MOOCs
Diyi Yang | Miaomiao Wen | Carolyn Rose
Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs

Search
Co-authors