Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)

Laura Biester, Dorottya Demszky, Zhijing Jin, Mrinmaya Sachan, Joel Tetreault, Steven Wilson, Lu Xiao, Jieyu Zhao (Editors)


Anthology ID:
2022.nlp4pi-1
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
NLP4PI
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2022.nlp4pi-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/2022.nlp4pi-1.pdf

pdf bib
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)
Laura Biester | Dorottya Demszky | Zhijing Jin | Mrinmaya Sachan | Joel Tetreault | Steven Wilson | Lu Xiao | Jieyu Zhao

pdf bib
A unified framework for cross-domain and cross-task learning of mental health conditions
Huikai Chua | Andrew Caines | Helen Yannakoudakis

The detection of mental health conditions based on an individual’s use of language has received considerable attention in the NLP community. However, most work has focused on single-task and single-domain models, limiting the semantic space that they are able to cover and risking significant cross-domain loss. In this paper, we present two approaches towards a unified framework for cross-domain and cross-task learning for the detection of depression, post-traumatic stress disorder and suicide risk across different platforms that further utilizes inductive biases across tasks. Firstly, we develop a lightweight model using a general set of features that sets a new state of the art on several tasks while matching the performance of more complex task- and domain-specific systems on others. We also propose a multi-task approach and further extend our framework to explicitly capture the affective characteristics of someone’s language, further consolidating transfer of inductive biases and of shared linguistic characteristics. Finally, we present a novel dynamically adaptive loss weighting approach that allows for more stable learning across imbalanced datasets and better neural generalization performance. Our results demonstrate the effectiveness of our unified framework for mental ill-health detection across a number of diverse English datasets.

pdf bib
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI
Lucas Rosenblatt | Lorena Piedras | Julia Wilkins

Detecting “toxic” language in internet content is a pressing social and technical challenge. In this work, we focus on Perspective API from Jigsaw, a state-of-the-art tool that promises to score the “toxicity” of text, with a recent model update that claims impressive results (Lees et al., 2022). We seek to challenge certain normative claims about toxic language by proposing a new benchmark, Selected Adversarial SemanticS, or SASS. We evaluate Perspective on SASS, and compare to low-effort alternatives, like zero-shot and few-shot GPT-3 prompt models, in binary classification settings. We find that Perspective exhibits troubling shortcomings across a number of our toxicity categories. SASS provides a new tool for evaluating performance on previously undetected toxic language that avoids common normative pitfalls. Our work leads us to emphasize the importance of questioning assumptions made by tools already in deployment for toxicity detection in order to anticipate and prevent disparate harms.

pdf bib
Securely Capturing People’s Interactions with Voice Assistants at Home: A Bespoke Tool for Ethical Data Collection
Angus Addlesee

Speech production is nuanced and unique to every individual, but today’s Spoken Dialogue Systems (SDSs) are trained to use general speech patterns to successfully improve performance on various evaluation metrics. However, these patterns do not apply to certain user groups - often the very people that can benefit the most from SDSs. For example, people with dementia produce more disfluent speech than the general population. In order to evaluate systems with specific user groups in mind, and to guide the design of such systems to deliver maximum benefit to these users, data must be collected securely. In this short paper we present CVR-SI, a bespoke tool for ethical data collection. Designed for the healthcare domain, we argue that it should also be used in more general settings. We detail how off-the-shelf solutions fail to ensure that sensitive data remains secure and private. We then describe the ethical design and security features of our device, with a full guide on how to build both the hardware and software components of CVR-SI. Our design ensures inclusivity to all researchers in this field, particularly those who are not hardware experts. This guarantees everyone can collect appropriate data for human evaluation ethically, securely, and in a timely manner.

pdf bib
Leveraging World Knowledge in Implicit Hate Speech Detection
Jessica Lin

While much attention has been paid to identifying explicit hate speech, implicit hateful expressions that are disguised in coded or indirect language are pervasive and remain a major challenge for existing hate speech detection systems. This paper presents the first attempt to apply Entity Linking (EL) techniques to both explicit and implicit hate speech detection, where we show that such real world knowledge about entity mentions in a text does help models better detect hate speech, and the benefit of adding it into the model is more pronounced when explicit entity triggers (e.g., rally, KKK) are present. We also discuss cases where real world knowledge does not add value to hate speech detection, which provides more insights into understanding and modeling the subtleties of hate speech.

pdf bib
A Dataset of Sustainable Diet Arguments on Twitter
Marcus Hansen | Daniel Hershcovich

Sustainable development requires a significant change in our dietary habits. Argument mining can help achieve this goal by both affecting and helping understand people’s behavior. We design an annotation scheme for argument mining from online discourse around sustainable diets, including novel evidence types specific to this domain. Using Twitter as a source, we crowdsource a dataset of 597 tweets annotated in relation to 5 topics. We benchmark a variety of NLP models on this dataset, demonstrating strong performance in some sub-tasks, while highlighting remaining challenges.

pdf bib
Impacts of Low Socio-economic Status on Educational Outcomes: A Narrative Based Analysis
Motti Kelbessa | Ilyas Jamil | Labiba Jahan

Socioeconomic status (SES) is a metric used to compare a person’s social standing based on their income, level of education, and occupation. Students from low SES backgrounds are those whose parents have low income and have limited access to the resources and opportunities they need to aid their success. Researchers have studied many issues and solutions for students with low SES, and there is a lot of research going on in many fields, especially in the social sciences. Computer science, however, has not yet as a field turned its considerable potential to addressing these inequalities. Utilizing Natural Language Processing (NLP) methods and technology, our work aims to address these disparities and ways to bridge the gap. We built a simple string matching algorithm including Latent Dirichlet Allocation (LDA) topic model and Open Information Extraction (open IE) to generate relational triples that are connected to the context of the students’ challenges, and the strategies they follow to overcome them. We manually collected 16 narratives about the experiences of low SES students in higher education from a publicly accessible internet forum (Reddit) and tested our model on them. We demonstrate that our strategy is effective (from 37.50% to 80%) in gathering contextual data about low SES students, in particular, about their difficulties while in a higher educational institution and how they improve their situation. A detailed error analysis suggests that increase of data, improvement of the LDA model, and quality of triples can help get better results from our model. For the advantage of other researchers, we make our code available.

pdf bib
Enhancing Crisis-Related Tweet Classification with Entity-Masked Language Modeling and Multi-Task Learning
Philipp Seeberger | Korbinian Riedhammer

Social media has become an important information source for crisis management and provides quick access to ongoing developments and critical information. However, classification models suffer from event-related biases and highly imbalanced label distributions which still poses a challenging task. To address these challenges, we propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem. We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types. Moreover, we found that entity-masking reduces the effect of overfitting to in-domain events and enables improvements in cross-event generalization.

pdf bib
Misinformation Detection in the Wild: News Source Classification as a Proxy for Non-article Texts
Matyas Bohacek

Creating classifiers of disinformation is time-consuming, expensive, and requires vast effort from experts spanning different fields. Even when these efforts succeed, their roll-out to publicly available applications stagnates. While these models struggle to find their consumer-accessible use, disinformation behavior online evolves at a pressing speed. The hoaxes get shared in various abbreviations on social networks, often in user-restricted areas, making external monitoring and intervention virtually impossible. To re-purpose existing NLP methods for the new paradigm of sharing misinformation, we propose leveraging information about given texts’ originating news sources to proxy the respective text’s trustworthiness. We first present a methodology for determining the sources’ overall credibility. We demonstrate our pipeline construction in a specific language and introduce CNSC: a novel dataset for Czech articles’ news source and source credibility classification. We constitute initial benchmarks on multiple architectures. Lastly, we create in-the-wild wrapper applications of the trained models: a chatbot, a browser extension, and a standalone web application.

pdf bib
Modelling Persuasion through Misuse of Rhetorical Appeals
Amalie Pauli | Leon Derczynski | Ira Assent

It is important to understand how people use words to persuade each other. This helps understand debate, and detect persuasive narratives in regard to e.g. misinformation. While computational modelling of some aspects of persuasion has received some attention, a way to unify and describe the overall phenomenon of when persuasion becomes undesired and problematic, is missing. In this paper, we attempt to address this by proposing a taxonomy of computational persuasion. Drawing upon existing research and resources, this paper shows how to re-frame and re-organise current work into a coherent framework targeting the misuse of rhetorical appeals. As a study to validate these re-framings, we then train and evaluate models of persuasion adapted to our taxonomy. Our results show an application of our taxonomy, and we are able to detecting misuse of rhetorical appeals, finding that these are more often used in misinformative contexts than in true ones.

pdf bib
Breaking through Inequality of Information Acquisition among Social Classes: A Modest Effort on Measuring “Fun”
Chenghao Xiao | Baicheng Sun | Jindi Wang | Mingyue Liu | Jiayi Feng

With the identification of the inequality encoded in information acquisition among social classes, we propose to leverage a powerful concept that has never been studied as a linguistic construct, “fun”, to deconstruct the inequality. Inspired by theories in sociology, we draw connection between social class and information cocoon, through the lens of fun, and hypothesize the measurement of “how fun one’s dominating social cocoon is” to be an indicator of the social class of an individual. Following this, we propose an NLP framework to combat the issue by measuring how fun one’s information cocoon is, and empower individuals to emancipate from their trapped cocoons. We position our work to be a domain-agnostic framework that can be deployed in a lot of downstream cases, and is one that aims to deconstruct, as opposed to reinforcing, the traditional social structure of beneficiaries.

pdf bib
Using NLP to Support English Teaching in Rural Schools
Luis Chiruzzo | Laura Musto | Santiago Gongora | Brian Carpenter | Juan Filevich | Aiala Rosa

We present a web application for creating games and exercises for teaching English as a foreign language with the help of NLP tools. The application contains different kinds of games such as crosswords, word searches, a memory game, and a multiplayer game based on the classic battleship pen and paper game. This application was built with the aim of supporting teachers in rural schools that are teaching English lessons, so they can easily create interactive and engaging activities for their students. We present the context and history of the project, the current state of the web application, and some ideas on how we will expand it in the future.

pdf bib
“Am I Answering My Job Interview Questions Right?”: A NLP Approach to Predict Degree of Explanation in Job Interview Responses
Raghu Verrap | Ehsanul Nirjhar | Ani Nenkova | Theodora Chaspari

Providing the right amount of explanation in an employment interview can help the interviewee effectively communicate their skills and experience to the interviewer and convince the she/he is the right candidate for the job. This paper examines natural language processing (NLP) approaches, including word-based tokenization, lexicon-based representations, and pre-trained embeddings with deep learning models, for detecting the degree of explanation in a job interview response. These are exemplified in a study of 24 military veterans who are the focal group of this study, since they can experience unique challenges in job interviews due to the unique verbal communication style that is prevalent in the military. Military veterans conducted mock interviews with industry recruiters and data from these interviews were transcribed and analyzed. Results indicate that the feasibility of automated NLP methods for detecting the degree of explanation in an interview response. Features based on tokenizer analysis are the most effective in detecting under-explained responses (i.e., 0.29 F1-score), while lexicon-based methods depict the higher performance in detecting over-explanation (i.e., 0.51 F1-score). Findings from this work lay the foundation for the design of intelligent assistive technologies that can provide personalized learning pathways to job candidates, especially those belonging to sensitive or under-represented populations, and helping them succeed in employment job interviews, ultimately contributing to an inclusive workforce.

pdf bib
Identifying Condescending Language: A Tale of Two Distinct Phenomena?
Carla Perez Almendros | Steven Schockaert

Patronizing and condescending language is characterized, among others, by its subtle nature. It thus seems reasonable to assume that detecting condescending language in text would be harder than detecting more explicitly harmful language, such as hate speech. However, the results of a SemEval-2022 Task devoted to this topic paint a different picture, with the top-performing systems achieving remarkably strong results. In this paper, we analyse the surprising effectiveness of standard text classification methods in more detail. In particular, we highlight the presence of two rather different types of condescending language in the dataset from the SemEval task. Some inputs are condescending because of the way they talk about a particular subject, i.e. condescending language in this case is a linguistic phenomenon, which can, in principle, be learned from training examples. However, other inputs are condescending because of the nature of what is said, rather than the way in which it is expressed, e.g. by emphasizing stereotypes about a given community. In such cases, our ability to detect condescending language, with current methods, largely depends on the presence of similar examples in the training data.

pdf bib
BELA: Bot for English Language Acquisition
Muskan Mahajan

In this paper, we introduce a conversational agent (chatbot) for Hindi-speaking youth called BELA—Bot for English Language Acquisition. Developed for young underprivileged students at an Indian non-profit, the agent supports both Hindi and Hinglish (code-switched Hindi and English, written primarily with English orthography) utterances. BELA has two interaction modes: a question-answering mode for classic English language learning tasks like word meanings, translations, reading passage comprehensions, etc., and an open-domain dialogue system mode to allow users to practice their language skills. We present a high-level overview of the design of BELA, including the implementation details and the preliminary results of our early prototype. We also report the challenges in creating an English-language learning chatbot for a largely Hindi-speaking population.

pdf bib
Applicability of Pretrained Language Models: Automatic Screening for Children’s Language Development Level
Byoung-doo Oh | Yoon-koung Lee | Yu-seop Kim

The various potential of children can be limited by language delay or language impairments. However, there are many instances where parents are unaware of the child’s condition and do not obtain appropriate treatment as a result. Additionally, experts collecting children’s utterance to establish norms of language tests and evaluating children’s language development level takes a significant amount of time and work. To address these issues, dependable automated screening tools are required. In this paper, we used pretrained LM to assist experts in quickly and objectively screening the language development level of children. Here, evaluating the language development level is to ensure that the child has the appropriate language abilities for his or her age, which is the same as the child’s age. To do this, we analyzed the utterances of children according to age. Based on these findings, we use the standard deviations of the pretrained LM’s probability as a score for children to screen their language development level. The experiment results showed very strong correlations between our proposed method and the Korean language test REVT (REVT-R, REVT-E), with Pearson correlation coefficient of 0.9888 and 0.9892, respectively.

pdf bib
Transformers-Based Approach for a Sustainability Term-Based Sentiment Analysis (STBSA)
Blaise Sandwidi | Suneer Pallitharammal Mukkolakal

Traditional sentiment analysis is a sentence level or document-level task. However, a sentence or paragraph may contain multiple target terms with different sentiments, making sentiment prediction more challenging. Although pre-trained language models like BERT have been successful, incorporating dynamic semantic changes into aspect-based sentiment models remains difficult, especially for domain-specific sentiment analysis. To this end, in this paper, we propose a Term-Based Sentiment Analysis (TBSA), a novel method designed to learn Environmental, Social, and Governance (ESG) contexts based on a sustainability taxonomy for ESG aspect-oriented sentiment analysis. Notably, we introduce a technique enhancing the ESG term’s attention, inspired by the success of attention-based neural networks in machine translation (Bahdanau et al., 2015) and Computer Vision (Bello et al., 2019). It enables the proposed model to focus on a small region of the sentences at each step and to reweigh the crucial terms for a better understanding of the ESG aspect-aware sentiment. Beyond the novelty in the model design, we propose a new dataset of 125,000+ ESG analyst annotated data points for sustainability term based sentiment classification, which derives from historical sustainability corpus data and expertise acquired by development finance institutions. Our extensive experiments combining the new method and the new dataset demonstrate the effectiveness of the Sustainability TBSA model with an accuracy of 91.30% (90% F1-score). Both internal and external business applications of our model show an evident potential for a significant positive impact toward furthering sustainable development goals (SDGs).

pdf bib
Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features
Gokul Karthik Kumar | Karthik Nandakumar

Hateful memes are a growing menace on social media. While the image and its corresponding text in a meme are related, they do not necessarily convey the same meaning when viewed individually. Hence, detecting hateful memes requires careful consideration of both visual and textual information. Multimodal pre-training can be beneficial for this task because it effectively captures the relationship between the image and the text by representing them in a similar feature space. Furthermore, it is essential to model the interactions between the image and text features through intermediate fusion. Most existing methods either employ multimodal pre-training or intermediate fusion, but not both. In this work, we propose the Hate-CLIPper architecture, which explicitly models the cross-modal interactions between the image and text representations obtained using Contrastive Language-Image Pre-training (CLIP) encoders via a feature interaction matrix (FIM). A simple classifier based on the FIM representation is able to achieve state-of-the-art performance on the Hateful Memes Challenge (HMC) dataset with an AUROC of 85.8, which even surpasses the human performance of 82.65. Experiments on other meme datasets such as Propaganda Memes and TamilMemes also demonstrate the generalizability of the proposed approach. Finally, we analyze the interpretability of the FIM representation and show that cross-modal interactions can indeed facilitate the learning of meaningful concepts. The code for this work is available at https://github.com/gokulkarthik/hateclipper