Robustness and Adversarial Examples in Natural Language Processing

Recent studies show that many NLP systems are sensitive and vulnerable to a small perturbation of inputs and do not generalize well across different datasets. This lack of robustness derails the use of NLP systems in real-world applications. This tutorial aims at bringing awareness of practical concerns about NLP robustness. It targets NLP researchers and practitioners who are interested in building reliable NLP systems. In particular, we will review recent studies on analyzing the weakness of NLP systems when facing adversarial inputs and data with a distribution shift. We will provide the audience with a holistic view of 1) how to use adversarial examples to examine the weakness of NLP models and facilitate debugging; 2) how to enhance the robustness of existing NLP models and defense against adversarial inputs; and 3) how the consideration of robustness affects the real-world NLP applications used in our daily lives. We will conclude the tutorial by outlining future research directions in this area.

Type of Tutorial: Cutting edge.

Tutorial Description
Recent advances in data-driven machine learning techniques such as deep neural networks have revolutionized natural language processing. In particular, modern natural language processing (NLP) systems have achieved outstanding performance on various tasks such as question answering, textual entailment, language generation. In many cases, they even achieve higher performance than interannotator agreement on benchmark datasets. It may be tempting to conclude from results on these datasets that current systems are as good as humans at these NLP tasks.
Despite the remarkable success, recent studies show that these systems often rely on spurious correlations and fail catastrophically when given inputs from different sources or inputs that have been adversarially perturbed. For example, Jia and Liang (2017) shows that state-of-the-art reading comprehension systems fail to answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer. Similarly, a series of studies (e.g., (Ribeiro et al., 2018;Alzantot et al., 2018;Iyyer et al., 2018)) demonstrate that text classification models are not robust against adversarial examples that generated by synonym substitution, paraphrasing, and inserting/deleting characters in the text input. This lack of robustness exposes troubling gaps in current models' language understanding capabilities and creates problems when NLP systems are deployed to real users.
As NLP systems are increasingly integrated into people's daily lives and directly interact with endusers, it is essential to ensure their reliability. For example, systems that flag hateful social media content for review must be robust to adversaries who wish to evade detection (Hosseini et al., 2017). Defending against these threats requires building systems that are robust to whatever alterations an attacker might apply to text in order to achieve the desired classifier behavior. Besides, even if systems perform well on user queries on average, rare but catastrophic errors can lead to serious issues. In 2017, Facebook's machine translation system mistakenly translated an Arabic Facebook post with the message "Good morning" into a Hebrew phrase that meant "Attack them" (Berger, 2017). As a result, the Israeli police arrested the man who made the post and detained him for several hours until the misunderstanding is resolved. Therefore, deployed systems must avoid egregious errors like wrongly translating non-violent messages into violent ones and should be tested on "worst-case" non-violent messages.
In this tutorial, we will review the history of adversarial example generation and methods for enhancing robustness of NLP systems. In particular, we will present recent community effort in the following topics: • Algorithms for generating adversarial examples to "debug" NLP systems. We will cover a variety of approaches such as synonym substitution, syntactically controlled paraphrasing, character-level adversarial attacks and many applications, including sentiment analysis, textural entailment, question answering, and machine translation.
• Robustness to spurious correlations and methods for mitigating dataset bias.
• Adversarial data generation for collecting datasets.
• Certified robustness in NLP.
• Debugging and behavior testing of NLP models by adversarial and automatic data generation.
• Lessons and discussion on how to build reliable, accountable NLP systems.
The tutorial will bring researchers and practitioners to be aware of the robustness issues of NLP systems and encourage the research community to propose innovative solutions to develop robust, reliable, and accountable NLP systems.

Detail Outline
This tutorial presents a systematic overview of frontier approaches to generating adversarial examples to facilitate behavior testing and debugging of NLP systems. We will also review the studies revealing that NLP models make predictions based on spurious correlations learned in the data and discuss approaches to enhancing their robustness. We will motivate the discussion using various NLP tasks and will outline emerging research challenges on this topic at the end of the tutorial. The detailed contents covered in the tutorial are outlined below.

Motivation
We will motivate the audience by demonstrating practical examples where NLP systems are brittle to adversarial examples and data distributional shifts. Then, we will outline the challenges of building reliable and robust NLP systems.

Generating Adversarial Examples for Text Classification
Many NLP problems such as document categorization, sentiment analysis and textual entailment can be modeled as a text classification task. However, recent studies show that by slightly modifying a correctly classified example can cause the highperforming models to misclassify. We will discuss various algorithms for generating such adversarial examples and how these examples can be used to test the behaviors of models and facilitate debugging.

Certified Robustness and Defending against Adversarial Attacks in NLP
Next, we will discuss methods for enhancing models against adversarial examples. Ensuring robustness to seemingly simple perturbations, such as typos or synonym replacements, is already challenging. In particular, since multiple parts of a sentence may be perturbed independently, there is a combinatorially large space of possible perturbations. We will discuss methods that augment training data with adversarial examples as well as methods that produce certificates of robustness. The latter enjoy computationally tractable guarantees that a model is correct on every allowed perturbation of a given input.

Robustness to Spurious Correlations
Aside from adversarial attacks, current models are also prone to spurious correlations, i.e. predictive patterns that work well on a specific dataset but do not hold in general. As a result, models fail under a mild distribution shift. In this part, we will discuss methods that guard against known spurious correlations in the data and the robustness of largescale pre-trained models.

Adversarial data collection
Given the flaws in existing datasets, it seems likely that building robust NLP models will also require better ways to collect training data. In this part, we will discuss recent work that collects datasets using an adversarial data generation process, typically involving humans in the loop. We will also discuss connections with classical active learning approaches to data collection.

Adversarial Trigger and Text Generation
While most of the discussion in the tutorial focuses on natural language understanding, many language generation systems directly interact with end users and ensuring their robustness is equivalently important. In this part, we will discuss robustness issues in language generation tasks. We will also introduce adversarial triggers, input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset, and its application in conditional language generation.

Conclusion, Future Directions, and Discussion
We will conclude the tutorial by discussing future directions to promote robustness in NLP.

Reading List
While the tutorial will include our own work (

Prerequisite Knowledge
Our target audience is general NLP conference attendances; therefore, no specific knowledge is assumed of the audience except basic machine learning and NLP background: • Understand derivatives and gradient decent methods as found in introductory Calculus.
• Understand the basic supervised learning paradigm and commonly used machine learning models such as logistic regression and deep neural networks.
• Familiar with common natural language processing concepts (e.g., parse trees, word representation) as found in an introductory NLP course.

Tutorial Instructors
Our instructors consist of experts who have conducted research in different aspects related to the tutorial topic.