Margaret Mitchell


2022

pdf bib
You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings
Zeerak Talat | Aurélie Névéol | Stella Biderman | Miruna Clinciu | Manan Dey | Shayne Longpre | Sasha Luccioni | Maraim Masoud | Margaret Mitchell | Dragomir Radev | Shanya Sharma | Arjun Subramonian | Jaesung Tae | Samson Tan | Deepak Tunuguntla | Oskar Van Der Wal
Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models

Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluation biases for large multilingual language models, we situate the discussion of bias evaluation within a wider context of social scientific research with computational work.We highlight three dimensions of developing multilingual bias evaluation frameworks: (1) increasing transparency through documentation, (2) expanding targets of bias beyond gender, and (3) addressing cultural differences that exist between languages.We further discuss the power dynamics and consequences of training large language models and recommend that researchers remain cognizant of the ramifications of developing such technologies.

2021

pdf bib
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge | Maarten Sap | Ana Marasović | William Agnew | Gabriel Ilharco | Dirk Groeneveld | Margaret Mitchell | Matt Gardner
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Large language models have led to remarkable progress on many NLP tasks, and researchers are turning to ever-larger text corpora to train them. Some of the largest corpora available are made by scraping significant portions of the internet, and are frequently introduced with only minimal documentation. In this work we provide some of the first documentation for the Colossal Clean Crawled Corpus (C4; Raffel et al., 2020), a dataset created by applying a set of filters to a single snapshot of Common Crawl. We begin by investigating where the data came from, and find a significant amount of text from unexpected sources like patents and US military websites. Then we explore the content of the text itself, and find machine-generated text (e.g., from machine translation systems) and evaluation examples from other benchmark NLP datasets. To understand the impact of the filters applied to create this dataset, we evaluate the text that was removed, and show that blocklist filtering disproportionately removes text from and about minority individuals. Finally, we conclude with some recommendations for how to created and document web-scale datasets from a scrape of the internet.

2019

pdf bib
Perturbation Sensitivity Analysis to Detect Unintended Model Biases
Vinodkumar Prabhakaran | Ben Hutchinson | Margaret Mitchell
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Data-driven statistical Natural Language Processing (NLP) techniques leverage large amounts of language data to build models that can understand language. However, most language data reflect the public discourse at the time the data was produced, and hence NLP models are susceptible to learning incidental associations around named referents at a particular point in time, in addition to general linguistic meaning. An NLP system designed to model notions such as sentiment and toxicity should ideally produce scores that are independent of the identity of such entities mentioned in text and their social associations. For example, in a general purpose sentiment analysis system, a phrase such as I hate Katy Perry should be interpreted as having the same sentiment as I hate Taylor Swift. Based on this idea, we propose a generic evaluation framework, Perturbation Sensitivity Analysis, which detects unintended model biases related to named entities, and requires no new annotations or corpora. We demonstrate the utility of this analysis by employing it on two different NLP models — a sentiment model and a toxicity model — applied on online comments in English language from four different genres.

pdf bib
Proceedings of the Second Workshop on Storytelling
Francis Ferraro | Ting-Hao ‘Kenneth’ Huang | Stephanie M. Lukin | Margaret Mitchell
Proceedings of the Second Workshop on Storytelling

2018

pdf bib
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing
Mark Alfano | Dirk Hovy | Margaret Mitchell | Michael Strube
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing

pdf bib
Proceedings of the First Workshop on Storytelling
Margaret Mitchell | Ting-Hao ‘Kenneth’ Huang | Francis Ferraro | Ishan Misra
Proceedings of the First Workshop on Storytelling

2017

pdf bib
Multitask Learning for Mental Health Conditions with Limited Social Media Data
Adrian Benton | Margaret Mitchell | Dirk Hovy
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Language contains information about the author’s demographic attributes as well as their mental state, and has been successfully leveraged in NLP to predict either one alone. However, demographic attributes and mental states also interact with each other, and we are the first to demonstrate how to use them jointly to improve the prediction of mental health conditions across the board. We model the different conditions as tasks in a multitask learning (MTL) framework, and establish for the first time the potential of deep learning in the prediction of mental health from online user-generated text. The framework we propose significantly improves over all baselines and single-task models for predicting mental health conditions, with particularly significant gains for conditions with limited data. In addition, our best MTL model can predict the presence of conditions (neuroatypicality) more generally, further reducing the error of the strong feed-forward baseline.

pdf bib
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing
Dirk Hovy | Shannon Spruit | Margaret Mitchell | Emily M. Bender | Michael Strube | Hanna Wallach
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

2016

pdf bib
Generating Natural Questions About an Image
Nasrin Mostafazadeh | Ishan Misra | Jacob Devlin | Margaret Mitchell | Xiaodong He | Lucy Vanderwende
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Visual Storytelling
Ting-Hao Kenneth Huang | Francis Ferraro | Nasrin Mostafazadeh | Ishan Misra | Aishwarya Agrawal | Jacob Devlin | Ross Girshick | Xiaodong He | Pushmeet Kohli | Dhruv Batra | C. Lawrence Zitnick | Devi Parikh | Lucy Vanderwende | Michel Galley | Margaret Mitchell
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
A Survey of Current Datasets for Vision and Language Research
Francis Ferraro | Nasrin Mostafazadeh | Ting-Hao Huang | Lucy Vanderwende | Jacob Devlin | Michel Galley | Margaret Mitchell
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
Alessandro Sordoni | Michel Galley | Michael Auli | Chris Brockett | Yangfeng Ji | Margaret Mitchell | Jian-Yun Nie | Jianfeng Gao | Bill Dolan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Quantifying the Language of Schizophrenia in Social Media
Margaret Mitchell | Kristy Hollingshead | Glen Coppersmith
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
CLPsych 2015 Shared Task: Depression and PTSD on Twitter
Glen Coppersmith | Mark Dredze | Craig Harman | Kristy Hollingshead | Margaret Mitchell
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
Language Models for Image Captioning: The Quirks and What Works
Jacob Devlin | Hao Cheng | Hao Fang | Saurabh Gupta | Li Deng | Xiaodong He | Geoffrey Zweig | Margaret Mitchell
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
Michel Galley | Chris Brockett | Alessandro Sordoni | Yangfeng Ji | Michael Auli | Chris Quirk | Margaret Mitchell | Jianfeng Gao | Bill Dolan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Low-Resource Semantic Role Labeling
Matthew R. Gormley | Margaret Mitchell | Benjamin Van Durme | Mark Dredze
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
I’m a Belieber: Social Roles via Self-identification and Conceptual Attributes
Charley Beller | Rebecca Knowles | Craig Harman | Shane Bergsma | Margaret Mitchell | Benjamin Van Durme
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
Philip Resnik | Rebecca Resnik | Margaret Mitchell
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
Proceedings of the 8th International Natural Language Generation Conference (INLG)
Margaret Mitchell | Kathleen McCoy | David McDonald | Aoife Cahill
Proceedings of the 8th International Natural Language Generation Conference (INLG)

pdf bib
Proceedings of the INLG and SIGDIAL 2014 Joint Session
Margaret Mitchell | Kathleen McCoy | David McDonald | Aoife Cahill
Proceedings of the INLG and SIGDIAL 2014 Joint Session

pdf bib
Crowdsourcing Language Generation Templates for Dialogue Systems
Margaret Mitchell | Dan Bohus | Ece Kamar
Proceedings of the INLG and SIGDIAL 2014 Joint Session

2013

pdf bib
Generating Expressions that Refer to Visible Objects
Margaret Mitchell | Kees van Deemter | Ehud Reiter
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Open Domain Targeted Sentiment
Margaret Mitchell | Jacqui Aguilar | Theresa Wilson | Benjamin Van Durme
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Graphs and Spatial Relations in the Generation of Referring Expressions
Jette Viethen | Margaret Mitchell | Emiel Krahmer
Proceedings of the 14th European Workshop on Natural Language Generation

2012

pdf bib
Midge: Generating Descriptions of Images
Margaret Mitchell | Xufeng Han | Jeff Hayes
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf bib
Discourse-Based Modeling for AAC
Margaret Mitchell | Richard Sproat
Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies

pdf bib
Midge: Generating Image Descriptions From Computer Vision Detections
Margaret Mitchell | Jesse Dodge | Amit Goyal | Kota Yamaguchi | Karl Stratos | Xufeng Han | Alyssa Mensch | Alex Berg | Tamara Berg | Hal Daumé III
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Detecting Visual Text
Jesse Dodge | Amit Goyal | Xufeng Han | Alyssa Mensch | Margaret Mitchell | Karl Stratos | Kota Yamaguchi | Yejin Choi | Hal Daumé III | Alex Berg | Tamara Berg
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Two Approaches for Generating Size Modifiers
Margaret Mitchell | Kees van Deemter | Ehud Reiter
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
Semi-Supervised Modeling for Prenominal Modifier Ordering
Margaret Mitchell | Aaron Dunlop | Brian Roark
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Natural Reference to Objects in a Visual Domain
Margaret Mitchell | Kees van Deemter | Ehud Reiter
Proceedings of the 6th International Natural Language Generation Conference

pdf bib
Prenominal Modifier Ordering via Multiple Sequence Alignment
Aaron Dunlop | Margaret Mitchell | Brian Roark
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Class-Based Ordering of Prenominal Modifiers
Margaret Mitchell
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

2007

pdf bib
Syntactic complexity measures for detecting Mild Cognitive Impairment
Brian Roark | Margaret Mitchell | Kristy Hollingshead
Biological, translational, and clinical language processing

Search
Venues