Proceedings of the First Workshop on Trustworthy Natural Language Processing

Yada Pruksachatkun, Anil Ramakrishna, Kai-Wei Chang, Satyapriya Krishna, Jwala Dhamala, Tanaya Guha, Xiang Ren (Editors)

Anthology ID:
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the First Workshop on Trustworthy Natural Language Processing
Yada Pruksachatkun | Anil Ramakrishna | Kai-Wei Chang | Satyapriya Krishna | Jwala Dhamala | Tanaya Guha | Xiang Ren

pdf bib
Interpretability Rules: Jointly Bootstrapping a Neural Relation Extractorwith an Explanation Decoder
Zheng Tang | Mihai Surdeanu

We introduce a method that transforms a rule-based relation extraction (RE) classifier into a neural one such that both interpretability and performance are achieved. Our approach jointly trains a RE classifier with a decoder that generates explanations for these extractions, using as sole supervision a set of rules that match these relations. Our evaluation on the TACRED dataset shows that our neural RE classifier outperforms the rule-based one we started from by 9 F1 points; our decoder generates explanations with a high BLEU score of over 90%; and, the joint learning improves the performance of both the classifier and decoder.

pdf bib
Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?
Hossein Azarpanah | Mohsen Farhadloo

Word embeddings are widely used in Natural Language Processing (NLP) for a vast range of applications. However, it has been consistently proven that these embeddings reflect the same human biases that exist in the data used to train them. Most of the introduced bias indicators to reveal word embeddings’ bias are average-based indicators based on the cosine similarity measure. In this study, we examine the impacts of different similarity measures as well as other descriptive techniques than averaging in measuring the biases of contextual and non-contextual word embeddings. We show that the extent of revealed biases in word embeddings depends on the descriptive statistics and similarity measures used to measure the bias. We found that over the ten categories of word embedding association tests, Mahalanobis distance reveals the smallest bias, and Euclidean distance reveals the largest bias in word embeddings. In addition, the contextual models reveal less severe biases than the non-contextual word embedding models.

pdf bib
Private Release of Text Embedding Vectors
Oluwaseyi Feyisetan | Shiva Kasiviswanathan

Ensuring strong theoretical privacy guarantees on text data is a challenging problem which is usually attained at the expense of utility. However, to improve the practicality of privacy preserving text analyses, it is essential to design algorithms that better optimize this tradeoff. To address this challenge, we propose a release mechanism that takes any (text) embedding vector as input and releases a corresponding private vector. The mechanism satisfies an extension of differential privacy to metric spaces. Our idea based on first randomly projecting the vectors to a lower-dimensional space and then adding noise in this projected space generates private vectors that achieve strong theoretical guarantees on its utility. We support our theoretical proofs with empirical experiments on multiple word embedding models and NLP datasets, achieving in some cases more than 10% gains over the existing state-of-the-art privatization techniques.

pdf bib
Accountable Error Characterization
Amita Misra | Zhe Liu | Jalal Mahmud

Customers of machine learning systems demand accountability from the companies employing these algorithms for various prediction tasks. Accountability requires understanding of system limit and condition of erroneous predictions, as customers are often interested in understanding the incorrect predictions, and model developers are absorbed in finding methods that can be used to get incremental improvements to an existing system. Therefore, we propose an accountable error characterization method, AEC, to understand when and where errors occur within the existing black-box models. AEC, as constructed with human-understandable linguistic features, allows the model developers to automatically identify the main sources of errors for a given classification system. It can also be used to sample for the set of most informative input points for a next round of training. We perform error detection for a sentiment analysis task using AEC as a case study. Our results on the sample sentiment task show that AEC is able to characterize erroneous predictions into human understandable categories and also achieves promising results on selecting erroneous samples when compared with the uncertainty-based sampling.

pdf bib
xER: An Explainable Model for Entity Resolution using an Efficient Solution for the Clique Partitioning Problem
Samhita Vadrevu | Rakesh Nagi | JinJun Xiong | Wen-mei Hwu

In this paper, we propose a global, self- explainable solution to solve a prominent NLP problem: Entity Resolution (ER). We formu- late ER as a graph partitioning problem. Every mention of a real-world entity is represented by a node in the graph, and the pairwise sim- ilarity scores between the mentions are used to associate these nodes to exactly one clique, which represents a real-world entity in the ER domain. In this paper, we use Clique Partition- ing Problem (CPP), which is an Integer Pro- gram (IP) to formulate ER as a graph partition- ing problem and then highlight the explainable nature of this method. Since CPP is NP-Hard, we introduce an efficient solution procedure, the xER algorithm, to solve CPP as a combi- nation of finding maximal cliques in the graph and then performing generalized set packing using a novel formulation. We discuss the advantages of using xER over the traditional methods and provide the computational exper- iments and results of applying this method to ER data sets.

pdf bib
Gender Bias in Natural Language Processing Across Human Languages
Abigail Matthews | Isabella Grasso | Christopher Mahoney | Yan Chen | Esma Wali | Thomas Middleton | Mariama Njie | Jeanna Matthews

Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. Gender bias in NLP has been well studied in English, but has been less studied in other languages. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages that have grammatically gendered nouns including different feminine, masculine, and neuter profession words. We discuss future work that would benefit immensely from a computational linguistics perspective.

pdf bib
Interpreting Text Classifiers by Learning Context-sensitive Influence of Words
Sawan Kumar | Kalpit Dixit | Kashif Shah

Many existing approaches for interpreting text classification models focus on providing importance scores for parts of the input text, such as words, but without a way to test or improve the interpretation method itself. This has the effect of compounding the problem of understanding or building trust in the model, with the interpretation method itself adding to the opacity of the model. Further, importance scores on individual examples are usually not enough to provide a sufficient picture of model behavior. To address these concerns, we propose MOXIE (MOdeling conteXt-sensitive InfluencE of words) with an aim to enable a richer interface for a user to interact with the model being interpreted and to produce testable predictions. In particular, we aim to make predictions for importance scores, counterfactuals and learned biases with MOXIE. In addition, with a global learning objective, MOXIE provides a clear path for testing and improving itself. We evaluate the reliability and efficiency of MOXIE on the task of sentiment analysis.

pdf bib
Towards Benchmarking the Utility of Explanations for Model Debugging
Maximilian Idahl | Lijun Lyu | Ujwal Gadiraju | Avishek Anand

Post-hoc explanation methods are an important class of approaches that help understand the rationale underlying a trained model’s decision. But how useful are they for an end-user towards accomplishing a given task? In this vision paper, we argue the need for a benchmark to facilitate evaluations of the utility of post-hoc explanation methods. As a first step to this end, we enumerate desirable properties that such a benchmark should possess for the task of debugging text classifiers. Additionally, we highlight that such a benchmark facilitates not only assessing the effectiveness of explanations but also their efficiency.