Einat Minkov

2025

Towards Author-informed NLP: Mind the Social Bias
Inbar Pendzel | Einat Minkov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Social text understanding is prone to fail when opinions are conveyed implicitly or sarcastically. It is therefore desired to model users’ contexts in processing the texts authored by them. In this work, we represent users within a social embedding space that was learned from the Twitter network at large-scale. Similar to word embeddings that encode lexical semantics, the network embeddings encode latent dimensions of social semantics. We perform extensive experiments on author-informed stance prediction, demonstrating improved generalization through inductive social user modeling, both within and across topics. Similar results were obtained for author-informed toxicity and incivility detection. The proposed approach may pave way to social NLP that considers user embeddings as contextual modality. However, our investigation also reveals that user stances are correlated with the personal socio-demographic traits encoded in their embeddings. Hence, author-informed NLP approaches may inadvertently model and reinforce socio-demographic and other social biases.

2024

pdf bib abs

A Closer Look at Multidimensional Online Political Incivility
Sagi Pendzel | Nir Lotan | Alon Zoizner | Einat Minkov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Toxic online political discourse has become prevalent, where scholars debate about its impact to Democratic processes. This work presents a large-scale study of political incivility on Twitter. In line with theories of political communication, we differentiate between harsh ‘impolite’ style and intolerant substance. We present a dataset of 13K political tweets in the U.S. context, which we collected and labeled by those categories using crowd sourcing. Our dataset and results shed light on hostile political discourse focused on partisan conflicts in the U.S. The evaluation of state-of-the-art classifiers illustrates the challenges involved in political incivility detection, which often requires high-level semantic and social understanding. Nevertheless, performing incivility detection at scale, we are able to characterise its distribution across individual users and geopolitical regions, where our findings align and extend existing theories of political communication. In particular, we find that roughly 80% of the uncivil tweets are authored by 20% of the users, where users who are politically engaged are more inclined to use uncivil language. We further find that political incivility exhibits network homophily, and that incivility is more prominent in highly competitive geopolitical regions. Our results apply to both uncivil style and substance.

2021

pdf bib abs

Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech
Tomer Wullach | Amir Adler | Einat Minkov
Findings of the Association for Computational Linguistics: EMNLP 2021

Automatic hate speech detection is hampered by the scarcity of labeled datasetd, leading to poor generalization. We employ pretrained language models (LMs) to alleviate this data bottleneck. We utilize the GPT LM for generating large amounts of synthetic hate speech sequences from available labeled examples, and leverage the generated data in fine-tuning large pretrained LMs on hate detection. An empirical study using the models of BERT, RoBERTa and ALBERT, shows that this approach improves generalization significantly and consistently within and across data distributions. In fact, we find that generating relevant labeled hate speech sequences is preferable to using out-of-domain, and sometimes also within-domain, human-labeled examples.

Einat Minkov

2025

2024

2021

2016

2015

2012

2008

2007

2006

2005

Co-authors

Venues