Alan Ritter


2021

pdf bib
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

pdf bib
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Ashutosh Baheti | Maarten Sap | Alan Ritter | Mark Riedl
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Dialogue models trained on human conversations inadvertently learn to generate toxic responses. In addition to producing explicitly offensive utterances, these models can also implicitly insult a group or individual by aligning themselves with an offensive statement. To better understand the dynamics of contextually offensive language, we investigate the stance of dialogue model responses in offensive Reddit conversations. Specifically, we create ToxiChat, a crowd-annotated dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance. Our analysis reveals that 42% of human responses agree with toxic comments, whereas only 13% agree with safe comments. This undesirable behavior is learned by neural dialogue models, such as DialoGPT, which we show are two times more likely to agree with offensive comments. To enable automatic detection of offensive language, we fine-tuned transformer-based classifiers on ToxiChat that achieve 0.71 F1 for offensive labels and 0.53 Macro-F1 for stance labels. Finally, we quantify the effectiveness of controllable text generation (CTG) methods to mitigate the tendency of neural dialogue models to agree with offensive comments. Compared to the baseline, our best CTG model achieves a 19% reduction in agreement with offensive comments and produces 29% fewer offensive replies. Our work highlights the need for further efforts to characterize and analyze inappropriate behavior in dialogue models, in order to help make them safer.

pdf bib
Pre-train or Annotate? Domain Adaptation with a Constrained Budget
Fan Bai | Alan Ritter | Wei Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent work has demonstrated that pre-training in-domain language models can boost performance when adapting to a new domain. However, the costs associated with pre-training raise an important question: given a fixed budget, what steps should an NLP practitioner take to maximize performance? In this paper, we study domain adaptation under budget constraints, and approach it as a customer choice problem between data annotation and pre-training. Specifically, we measure the annotation cost of three procedural text datasets and the pre-training cost of three in-domain language models. Then we evaluate the utility of different combinations of pre-training and data annotation under varying budget constraints to assess which combination strategy works best. We find that, for small budgets, spending all funds on annotation leads to the best performance; once the budget becomes large enough, a combination of data annotation and in-domain pre-training works more optimally. We therefore suggest that task-specific data annotation should be part of an economical strategy when adapting an NLP model to a new domain.

pdf bib
Model Selection for Cross-lingual Transfer
Yang Chen | Alan Ritter
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language. While this works surprisingly well, substantial variance has been observed in target language performance between different fine-tuning runs, and in the zero-shot setup, no target-language development data is available to select among multiple fine-tuned models. Prior work has relied on English dev data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. In this paper, we show that it is possible to select consistently better models when small amounts of annotated data are available in auxiliary pivot languages. We propose a machine learning approach to model selection that uses the fine-tuned model’s own internal representations to predict its cross-lingual capabilities. In extensive experiments we find that this method consistently selects better models than English validation data across twenty five languages (including eight low-resource languages), and often achieves results that are comparable to model selection using target language development data.

pdf bib
Process-Level Representation of Scientific Protocols with Interactive Annotation
Ronen Tamari | Fan Bai | Alan Ritter | Gabriel Stanovsky
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We develop Process Execution Graphs (PEG), a document-level representation of real-world wet lab biochemistry protocols, addressing challenges such as cross-sentence relations, long-range coreference, grounding, and implicit arguments. We manually annotate PEGs in a corpus of complex lab protocols with a novel interactive textual simulator that keeps track of entity traits and semantic constraints during annotation. We use this data to develop graph-prediction models, finding them to be good at entity identification and local relation extraction, while our corpus facilitates further exploration of challenging long-range relations.

2020

pdf bib
Fluent Response Generation for Conversational Question Answering
Ashutosh Baheti | Alan Ritter | Kevin Small
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Question answering (QA) is an important aspect of open-domain conversational agents, garnering specific research focus in the conversational QA (ConvQA) subtask. One notable limitation of recent ConvQA efforts is the response being answer span extraction from the target corpus, thus ignoring the natural language generation (NLG) aspect of high-quality conversational agents. In this work, we propose a method for situating QA responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses while maintaining correctness. From a technical perspective, we use data augmentation to generate training data for an end-to-end system. Specifically, we develop Syntactic Transformations (STs) to produce question-specific candidate answer responses and rank them using a BERT-based classifier (Devlin et al., 2019). Human evaluation on SQuAD 2.0 data (Rajpurkar et al., 2018) demonstrate that the proposed model outperforms baseline CoQA and QuAC models in generating conversational responses. We further show our model’s scalability by conducting tests on the CoQA dataset. The code and data are available at https://github.com/abaheti95/QADialogSystem.

pdf bib
Code and Named Entity Recognition in StackOverflow
Jeniya Tabassum | Mounica Maddela | Wei Xu | Alan Ritter
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

There is an increasing interest in studying natural language and computer code together, as large corpora of programming texts become readily available on the Internet. For example, StackOverflow currently has over 15 million programming related questions written by 8.5 million users. Meanwhile, there is still a lack of fundamental NLP techniques for identifying code tokens or software-related named entities that appear within natural language sentences. In this paper, we introduce a new named entity recognition (NER) corpus for the computer programming domain, consisting of 15,372 sentences annotated with 20 fine-grained entity types. We trained in-domain BERT representations (BERTOverflow) on 152 million sentences from StackOverflow, which lead to an absolute increase of +10 F1 score over off-the-shelf BERT. We also present the SoftNER model which achieves an overall 79.10 F-1 score for code and named entity recognition on StackOverflow data. Our SoftNER model incorporates a context-independent code token classifier with corpus-level features to improve the BERT-based tagging model. Our code and data are available at: https://github.com/jeniyat/StackOverflowNER/

pdf bib
Measuring Forecasting Skill from Text
Shi Zong | Alan Ritter | Eduard Hovy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

People vary in their ability to make accurate predictions about the future. Prior studies have shown that some individuals can predict the outcome of future events with consistently better accuracy. This leads to a natural question: what makes some forecasters better than others? In this paper we explore connections between the language people use to describe their predictions and their forecasting skill. Datasets from two different forecasting domains are explored: (1) geopolitical forecasts from Good Judgment Open, an online prediction forum and (2) a corpus of company earnings forecasts made by financial analysts. We present a number of linguistic metrics which are computed over text associated with people’s predictions about the future including: uncertainty, readability, and emotion. By studying linguistic factors associated with predictions, we are able to shed some light on the approach taken by skilled forecasters. Furthermore, we demonstrate that it is possible to accurately predict forecasting skill using a model that is based solely on language. This could potentially be useful for identifying accurate predictions or potentially skilled forecasters earlier.

pdf bib
An Empirical Study of Pre-trained Transformers for Arabic Information Extraction
Wuwei Lan | Yang Chen | Wei Xu | Alan Ritter
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Multilingual pre-trained Transformers, such as mBERT (Devlin et al., 2019) and XLM-RoBERTa (Conneau et al., 2020a), have been shown to enable effective cross-lingual zero-shot transfer. However, their performance on Arabic information extraction (IE) tasks is not very well studied. In this paper, we pre-train a customized bilingual BERT, dubbed GigaBERT, that is designed specifically for Arabic NLP and English-to-Arabic zero-shot transfer learning. We study GigaBERT’s effectiveness on zero-short transfer across four IE tasks: named entity recognition, part-of-speech tagging, argument role labeling, and relation extraction. Our best model significantly outperforms mBERT, XLM-RoBERTa, and AraBERT (Antoun et al., 2020) in both the supervised and zero-shot transfer settings. We have made our pre-trained models publicly available at: https://github.com/lanwuwei/GigaBERT.

pdf bib
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

pdf bib
WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols
Jeniya Tabassum | Wei Xu | Alan Ritter
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

This paper presents the results of the wet labinformation extraction task at WNUT 2020.This task consisted of two sub tasks- (1) anamed entity recognition task with 13 partic-ipants; and (2) a relation extraction task with2 participants. We outline the task, data an-notation process, corpus statistics, and providea high-level overview of the participating sys-tems for each sub task.

2019

pdf bib
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

pdf bib
Analyzing the Perceived Severity of Cybersecurity Threats Reported on Social Media
Shi Zong | Alan Ritter | Graham Mueller | Evan Wright
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Breaking cybersecurity events are shared across a range of websites, including security blogs (FireEye, Kaspersky, etc.), in addition to social media platforms such as Facebook and Twitter. In this paper, we investigate methods to analyze the severity of cybersecurity threats based on the language that is used to describe them online. A corpus of 6,000 tweets describing software vulnerabilities is annotated with authors’ opinions toward their severity. We show that our corpus supports the development of automatic classifiers with high precision for this task. Furthermore, we demonstrate the value of analyzing users’ opinions about the severity of threats reported online as an early indicator of important software vulnerabilities. We present a simple, yet effective method for linking software vulnerabilities reported in tweets to Common Vulnerabilities and Exposures (CVEs) in the National Vulnerability Database (NVD). Using our predicted severity scores, we show that it is possible to achieve a Precision@50 of 0.86 when forecasting high severity vulnerabilities, significantly outperforming a baseline that is based on tweet volume. Finally we show how reports of severe vulnerabilities online are predictive of real-world exploits.

pdf bib
Structured Minimally Supervised Learning for Neural Relation Extraction
Fan Bai | Alan Ritter
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present an approach to minimally supervised relation extraction that combines the benefits of learned representations and structured learning, and accurately predicts sentence-level relation mentions given only proposition-level supervision from a KB. By explicitly reasoning about missing data during learning, our approach enables large-scale training of 1D convolutional neural networks while mitigating the issue of label noise inherent in distant supervision. Our approach achieves state-of-the-art results on minimally supervised sentential relation extraction, outperforming a number of baselines, including a competitive approach that uses the attention layer of a purely neural model.

2018

pdf bib
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

pdf bib
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols
Chaitanya Kulkarni | Wei Xu | Alan Ritter | Raghu Machiraju
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We describe an effort to annotate a corpus of natural language instructions consisting of 622 wet lab protocols to facilitate automatic or semi-automatic conversion of protocols into a machine-readable format and benefit biological research. Experimental results demonstrate the utility of our corpus for developing machine learning approaches to shallow semantic parsing of instructional texts. We make our annotated Wet Lab Protocol Corpus available to the research community.

pdf bib
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Ashutosh Baheti | Alan Ritter | Jiwei Li | Bill Dolan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We propose two constraints that help generate more content rich responses that are based on a model of syntax and topics (Griffiths et al., 2005) and semantic similarity (Arora et al., 2016). We evaluate our approach against a variety of competitive baselines, using both automatic metrics and human judgments, showing that our proposed approach generates responses that are much less generic without sacrificing plausibility. A working demo of our code can be found at https://github.com/abaheti95/DC-NeuralConversation.

2017

pdf bib
“i have a feeling trump will win..................”: Forecasting Winners and Losers from User Predictions on Twitter
Sandesh Swamy | Alan Ritter | Marie-Catherine de Marneffe
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Social media users often make explicit predictions about upcoming events. Such statements vary in the degree of certainty the author expresses toward the outcome: “Leonardo DiCaprio will win Best Actor” vs. “Leonardo DiCaprio may win” or “No way Leonardo wins!”. Can popular beliefs on social media predict who will win? To answer this question, we build a corpus of tweets annotated for veridicality on which we train a log-linear classifier that detects positive veridicality with high precision. We then forecast uncertain outcomes using the wisdom of crowds, by aggregating users’ explicit predictions. Our method for forecasting winners is fully automated, relying only on a set of contenders as input. It requires no training data of past outcomes and outperforms sentiment and tweet volume baselines on a broad range of contest prediction tasks. We further demonstrate how our approach can be used to measure the reliability of individual accounts’ predictions and retrospectively identify surprise outcomes.

pdf bib
Adversarial Learning for Neural Dialogue Generation
Jiwei Li | Will Monroe | Tianlin Shi | Sébastien Jean | Alan Ritter | Dan Jurafsky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We apply adversarial training to open-domain dialogue generation, training a system to produce sequences that are indistinguishable from human-generated dialogue utterances. We cast the task as a reinforcement learning problem where we jointly train two systems: a generative model to produce response sequences, and a discriminator—analagous to the human evaluator in the Turing test— to distinguish between the human-generated dialogues and the machine-generated ones. In this generative adversarial network approach, the outputs from the discriminator are used to encourage the system towards more human-like dialogue. Further, we investigate models for adversarial evaluation that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls. Experimental results on several metrics, including adversarial evaluation, demonstrate that the adversarially-trained system generates higher-quality responses than previous baselines

pdf bib
Proceedings of the 3rd Workshop on Noisy User-generated Text
Leon Derczynski | Wei Xu | Alan Ritter | Tim Baldwin
Proceedings of the 3rd Workshop on Noisy User-generated Text

2016

pdf bib
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Bo Han | Alan Ritter | Leon Derczynski | Wei Xu | Tim Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

pdf bib
Results of the WNUT16 Named Entity Recognition Shared Task
Benjamin Strauss | Bethany Toma | Alan Ritter | Marie-Catherine de Marneffe | Wei Xu
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

This paper presents the results of the Twitter Named Entity Recognition shared task associated with W-NUT 2016: a named entity tagging task with 10 teams participating. We outline the shared task, annotation process and dataset statistics, and provide a high-level overview of the participating systems for each shared task.

pdf bib
TweeTime : A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter
Jeniya Tabassum | Alan Ritter | Wei Xu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li | Will Monroe | Alan Ritter | Dan Jurafsky | Michel Galley | Jianfeng Gao
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
SemEval-2016 Task 4: Sentiment Analysis in Twitter
Preslav Nakov | Alan Ritter | Sara Rosenthal | Fabrizio Sebastiani | Veselin Stoyanov
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Proceedings of the Workshop on Noisy User-generated Text
Wei Xu | Bo Han | Alan Ritter
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition
Timothy Baldwin | Marie Catherine de Marneffe | Bo Han | Young-Bum Kim | Alan Ritter | Wei Xu
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
SemEval-2015 Task 10: Sentiment Analysis in Twitter
Sara Rosenthal | Preslav Nakov | Svetlana Kiritchenko | Saif Mohammad | Alan Ritter | Veselin Stoyanov
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
SemEval-2014 Task 9: Sentiment Analysis in Twitter
Sara Rosenthal | Alan Ritter | Preslav Nakov | Veselin Stoyanov
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts
Jiwei Li | Alan Ritter | Claire Cardie | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Extracting Lexically Divergent Paraphrases from Twitter
Wei Xu | Alan Ritter | Chris Callison-Burch | William B. Dolan | Yangfeng Ji
Transactions of the Association for Computational Linguistics, Volume 2

We present MultiP (Multi-instance Learning Paraphrase Model), a new model suited to identify paraphrases within the short messages on Twitter. We jointly model paraphrase relations between word and sentence pairs and assume only sentence-level annotations during learning. Using this principled latent variable model alone, we achieve the performance competitive with a state-of-the-art method which combines a latent space model with a feature-based supervised classifier. Our model also captures lexically divergent paraphrases that differ from yet complement previous methods; combining our model with previous work significantly outperforms the state-of-the-art. In addition, we present a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter. We make this new dataset available to the research community.

pdf bib
Weakly Supervised User Profile Extraction from Twitter
Jiwei Li | Alan Ritter | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
Leon Derczynski | Alan Ritter | Sam Clark | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
SemEval-2013 Task 2: Sentiment Analysis in Twitter
Preslav Nakov | Sara Rosenthal | Zornitsa Kozareva | Veselin Stoyanov | Alan Ritter | Theresa Wilson
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Modeling Missing Data in Distant Supervision for Information Extraction
Alan Ritter | Luke Zettlemoyer | Mausam | Oren Etzioni
Transactions of the Association for Computational Linguistics, Volume 1

Distant supervision algorithms learn information extraction models given only large readily available databases and text collections. Most previous work has used heuristics for generating labeled data, for example assuming that facts not contained in the database are not mentioned in the text, and facts in the database must be mentioned at least once. In this paper, we propose a new latent-variable approach that models missing data. This provides a natural way to incorporate side information, for instance modeling the intuition that text will often mention rare entities which are likely to be missing in the database. Despite the added complexity introduced by reasoning about missing data, we demonstrate that a carefully designed local search approach to inference is very accurate and scales to large datasets. Experiments demonstrate improved performance for binary and unary relation extraction when compared to learning with heuristic labels, including on average a 27% increase in area under the precision recall curve in the binary case.

pdf bib
A Preliminary Study of Tweet Summarization using Information Extraction
Wei Xu | Ralph Grishman | Adam Meyers | Alan Ritter
Proceedings of the Workshop on Language Analysis in Social Media

pdf bib
Gathering and Generating Paraphrases from Twitter with Application to Normalization
Wei Xu | Alan Ritter | Ralph Grishman
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

2012

pdf bib
Paraphrasing for Style
Wei Xu | Alan Ritter | Bill Dolan | Ralph Grishman | Colin Cherry
Proceedings of COLING 2012

2011

pdf bib
Data-Driven Response Generation in Social Media
Alan Ritter | Colin Cherry | William B. Dolan
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Named Entity Recognition in Tweets: An Experimental Study
Alan Ritter | Sam Clark | Mausam | Oren Etzioni
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Unsupervised Modeling of Twitter Conversations
Alan Ritter | Colin Cherry | Bill Dolan
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
A Latent Dirichlet Allocation Method for Selectional Preferences
Alan Ritter | Mausam | Oren Etzioni
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Machine Reading at the University of Washington
Hoifung Poon | Janara Christensen | Pedro Domingos | Oren Etzioni | Raphael Hoffmann | Chloe Kiddon | Thomas Lin | Xiao Ling | Mausam | Alan Ritter | Stefan Schoenmackers | Stephen Soderland | Dan Weld | Fei Wu | Congle Zhang
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

2008

pdf bib
It’s a Contradiction – no, it’s not: A Case Study using Functional Relations
Alan Ritter | Stephen Soderland | Doug Downey | Oren Etzioni
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing