Christopher Homan


2021

pdf bib
Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi
Saurabh Sampatrao Gaikwad | Tharindu Ranasinghe | Marcos Zampieri | Christopher Homan
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English. To address this shortcoming, we introduce MOLD, the Marathi Offensive Language Dataset. MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in low-resource Indo-Aryan languages. We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-of-the-art cross-lingual transformers from existing data in Bengali, English, and Hindi.

pdf bib
LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction
Abhinandan Tejalkumar Desai | Kai North | Marcos Zampieri | Christopher Homan
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes team LCP-RIT’s submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP). The task organizers provided participants with an augmented version of CompLex (Shardlow et al., 2020), an English multi-domain dataset in which words in context were annotated with respect to their complexity using a five point Likert scale. Our system uses logistic regression and a wide range of linguistic features (e.g. psycholinguistic features, n-grams, word frequency, POS tags) to predict the complexity of single words in this dataset. We analyze the impact of different linguistic features on the classification performance and we evaluate the results in terms of mean absolute error, mean squared error, Pearson correlation, and Spearman correlation.

2020

pdf bib
Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara
Allahsera Auguste Tapo | Bakary Coulibaly | Sébastien Diarra | Christopher Homan | Julia Kreutzer | Sarah Luger | Arthur Nagashima | Marcos Zampieri | Michael Leventhal
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Low-resource languages present unique challenges to (neural) machine translation. We discuss the case of Bambara, a Mande language for which training data is scarce and requires significant amounts of pre-processing. More than the linguistic situation of Bambara itself, the socio-cultural context within which Bambara speakers live poses challenges for automated processing of this language. In this paper, we present the first parallel data set for machine translation of Bambara into and from English and French and the first benchmark results on machine translation to and from Bambara. We discuss challenges in working with low-resource languages and propose strategies to cope with data scarcity in low-resource machine translation (MT).

2018

pdf bib
Sensing and Learning Human Annotators Engaged in Narrative Sensemaking
McKenna Tornblad | Luke Lapresi | Christopher Homan | Raymond Ptucha | Cecilia Ovesdotter Alm
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

While labor issues and quality assurance in crowdwork are increasingly studied, how annotators make sense of texts and how they are personally impacted by doing so are not. We study these questions via a narrative-sorting annotation task, where carefully selected (by sequentiality, topic, emotional content, and length) collections of tweets serve as examples of everyday storytelling. As readers process these narratives, we measure their facial expressions, galvanic skin response, and self-reported reactions. From the perspective of annotator well-being, a reassuring outcome was that the sorting task did not cause a measurable stress response, however readers reacted to humor. In terms of sensemaking, readers were more confident when sorting sequential, target-topical, and highly emotional tweets. As crowdsourcing becomes more common, this research sheds light onto the perceptive capabilities and emotional impact of human readers.

2017

pdf bib
Understanding the Semantics of Narratives of Interpersonal Violence through Reader Annotations and Physiological Reactions
Alexander Calderwood | Elizabeth A. Pruett | Raymond Ptucha | Christopher Homan | Cecilia Ovesdotter Alm
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

Interpersonal violence (IPV) is a prominent sociological problem that affects people of all demographic backgrounds. By analyzing how readers interpret, perceive, and react to experiences narrated in social media posts, we explore an understudied source for discourse about abuse. We asked readers to annotate Reddit posts about relationships with vs. without IPV for stakeholder roles and emotion, while measuring their galvanic skin response (GSR), pulse, and facial expression. We map annotations to coreference resolution output to obtain a labeled coreference chain for stakeholders in texts, and apply automated semantic role labeling for analyzing IPV discourse. Findings provide insights into how readers process roles and emotion in narratives. For example, abusers tend to be linked with violent actions and certain affect states. We train classifiers to predict stakeholder categories of coreference chains. We also find that subjects’ GSR noticeably changed for IPV texts, suggesting that co-collected measurement-based data about annotators can be used to support text annotation.

2016

pdf bib
Generating Clinically Relevant Texts: A Case Study on Life-Changing Events
Mayuresh Oak | Anil Behera | Titus Thomas | Cecilia Ovesdotter Alm | Emily Prud’hommeaux | Christopher Homan | Raymond Ptucha
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Analyzing Gender Bias in Student Evaluations
Andamlak Terkik | Emily Prud’hommeaux | Cecilia Ovesdotter Alm | Christopher Homan | Scott Franklin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

University students in the United States are routinely asked to provide feedback on the quality of the instruction they have received. Such feedback is widely used by university administrators to evaluate teaching ability, despite growing evidence that students assign lower numerical scores to women and people of color, regardless of the actual quality of instruction. In this paper, we analyze students’ written comments on faculty evaluation forms spanning eight years and five STEM disciplines in order to determine whether open-ended comments reflect these same biases. First, we apply sentiment analysis techniques to the corpus of comments to determine the overall affect of each comment. We then use this information, in combination with other features, to explore whether there is bias in how students describe their instructors. We show that while the gender of the evaluated instructor does not seem to affect students’ expressed level of overall satisfaction with their instruction, it does strongly influence the language that they use to describe their instructors and their experience in class.

pdf bib
Understanding Discourse on Work and Job-Related Well-Being in Public Social Media
Tong Liu | Christopher Homan | Cecilia Ovesdotter Alm | Megan Lytle | Ann Marie White | Henry Kautz
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
An Analysis of Domestic Abuse Discourse on Reddit
Nicolas Schrading | Cecilia Ovesdotter Alm | Ray Ptucha | Christopher Homan
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
#WhyIStayed, #WhyILeft: Microblogging to Make Sense of Domestic Abuse
Nicolas Schrading | Cecilia Ovesdotter Alm | Raymond Ptucha | Christopher Homan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Toward Macro-Insights for Suicide Prevention: Analyzing Fine-Grained Distress at Scale
Christopher Homan | Ravdeep Johar | Tong Liu | Megan Lytle | Vincent Silenzio | Cecilia Ovesdotter Alm
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality