Stephan Lewandowsky


2023

pdf bib
IRMA: the 335-million-word Italian coRpus for studying MisinformAtion
Fabio Carrella | Alessandro Miani | Stephan Lewandowsky
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The dissemination of false information on the internet has received considerable attention over the last decade. Misinformation often spreads faster than mainstream news, thus making manual fact checking inefficient or, at best, labor-intensive. Therefore, there is an increasing need to develop methods for automatic detection of misinformation. Although resources for creating such methods are available in English, other languages are often under-represented in this effort. With this contribution, we present IRMA, a corpus containing over 600,000 Italian news articles (335+ million tokens) collected from 56 websites classified as ‘untrustworthy’ by professional fact-checkers. The corpus is freely available and comprises a rich set of text- and website-level data, representing a turnkey resource to test hypotheses and develop automatic detection algorithms. It contains texts, titles, and dates (from 2004 to 2022), along with three types of semantic measures (i.e., keywords, topics at three different resolutions, and LIWC lexical features). IRMA also includes domain-specific information such as source type (e.g., political, health, conspiracy, etc.), credibility, and higher-level metadata, including several metrics of website incoming traffic that allow to investigate user online behavior. IRMA constitutes the largest corpus of misinformation available today in Italian, making it a valid tool for advancing quantitative research on untrustworthy news detection and ultimately helping limit the spread of misinformation.

pdf bib
You Are What You Read: Inferring Personality From Consumed Textual Content
Adam Sutton | Almog Simchon | Matthew Edwards | Stephan Lewandowsky
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

In this work we use consumed text to infer Big-5 personality inventories using data we have collected from the social media platform Reddit. We test our model on two datasets, sampled from participants who consumed either fiction content (N = 913) or news content (N = 213). We show that state-of-the-art models from a similar task using authored text do not translate well to this task, with average correlations of r=.06 between the model’s predictions and ground-truth personality inventory dimensions. We propose an alternate method of generating average personality labels for each piece of text consumed, under which our model achieves correlations as high as r=.34 when predicting personality from the text being read.

pdf bib
Communicating Climate Change: A Comparison Between Tweets and Speeches by German Members of Parliament
Robin Schaefer | Christoph Abels | Stephan Lewandowsky | Manfred Stede
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Twitter and parliamentary speeches are very different communication channels, but many members of parliament (MPs) make use of both. Focusing on the topic of climate change, we undertake a comparative analysis of speeches and tweets uttered by MPs in Germany in a recent six-year period. By keyword/hashtag analyses and topic modeling, we find substantial differences along party lines, with left-leaning parties discussing climate change through a crisis frame, while liberal and conservative parties try to address climate change through the lens of climate-friendly technology and practices. Only the AfD denies the need to adopt climate change mitigating measures, demeaning those concerned about a deteriorating climate as climate cult or fanatics. Our analysis reveals that climate change communication does not differ substantially between Twitter and parliamentary speeches, but across the political spectrum.