Einat Minkov
2025
Towards Author-informed NLP: Mind the Social Bias
Inbar Pendzel | Einat Minkov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Inbar Pendzel | Einat Minkov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Social text understanding is prone to fail when opinions are conveyed implicitly or sarcastically. It is therefore desired to model users’ contexts in processing the texts authored by them. In this work, we represent users within a social embedding space that was learned from the Twitter network at large-scale. Similar to word embeddings that encode lexical semantics, the network embeddings encode latent dimensions of social semantics. We perform extensive experiments on author-informed stance prediction, demonstrating improved generalization through inductive social user modeling, both within and across topics. Similar results were obtained for author-informed toxicity and incivility detection. The proposed approach may pave way to social NLP that considers user embeddings as contextual modality. However, our investigation also reveals that user stances are correlated with the personal socio-demographic traits encoded in their embeddings. Hence, author-informed NLP approaches may inadvertently model and reinforce socio-demographic and other social biases.
2024
A Closer Look at Multidimensional Online Political Incivility
Sagi Pendzel | Nir Lotan | Alon Zoizner | Einat Minkov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Sagi Pendzel | Nir Lotan | Alon Zoizner | Einat Minkov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Toxic online political discourse has become prevalent, where scholars debate about its impact to Democratic processes. This work presents a large-scale study of political incivility on Twitter. In line with theories of political communication, we differentiate between harsh ‘impolite’ style and intolerant substance. We present a dataset of 13K political tweets in the U.S. context, which we collected and labeled by those categories using crowd sourcing. Our dataset and results shed light on hostile political discourse focused on partisan conflicts in the U.S. The evaluation of state-of-the-art classifiers illustrates the challenges involved in political incivility detection, which often requires high-level semantic and social understanding. Nevertheless, performing incivility detection at scale, we are able to characterise its distribution across individual users and geopolitical regions, where our findings align and extend existing theories of political communication. In particular, we find that roughly 80% of the uncivil tweets are authored by 20% of the users, where users who are politically engaged are more inclined to use uncivil language. We further find that political incivility exhibits network homophily, and that incivility is more prominent in highly competitive geopolitical regions. Our results apply to both uncivil style and substance.
2021
Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech
Tomer Wullach | Amir Adler | Einat Minkov
Findings of the Association for Computational Linguistics: EMNLP 2021
Tomer Wullach | Amir Adler | Einat Minkov
Findings of the Association for Computational Linguistics: EMNLP 2021
Automatic hate speech detection is hampered by the scarcity of labeled datasetd, leading to poor generalization. We employ pretrained language models (LMs) to alleviate this data bottleneck. We utilize the GPT LM for generating large amounts of synthetic hate speech sequences from available labeled examples, and leverage the generated data in fine-tuning large pretrained LMs on hate detection. An empirical study using the models of BERT, RoBERTa and ALBERT, shows that this approach improves generalization significantly and consistently within and across data distributions. In fact, we find that generating relevant labeled hate speech sequences is preferable to using out-of-domain, and sometimes also within-domain, human-labeled examples.
2016
Multi-source named entity typing for social media
Reuth Vexler | Einat Minkov
Proceedings of the Sixth Named Entity Workshop
Reuth Vexler | Einat Minkov
Proceedings of the Sixth Named Entity Workshop
2015
Learning to Identify the Best Contexts for Knowledge-based WSD
Evgenia Wasserman Pritsker | William Cohen | Einat Minkov
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Evgenia Wasserman Pritsker | William Cohen | Einat Minkov
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Learning Relational Features with Backward Random Walks
Ni Lao | Einat Minkov | William Cohen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Ni Lao | Einat Minkov | William Cohen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2012
Discriminative Learning for Joint Template Filling
Einat Minkov | Luke Zettlemoyer
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Einat Minkov | Luke Zettlemoyer
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Graph Based Similarity Measures for Synonym Extraction from Parsed Text
Einat Minkov | William Cohen
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing
Einat Minkov | William Cohen
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing
2008
Learning Graph Walk Based Similarity Measures for Parsed Text
Einat Minkov | William W. Cohen
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
Einat Minkov | William W. Cohen
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
2007
Generating Complex Morphology for Machine Translation
Einat Minkov | Kristina Toutanova | Hisami Suzuki
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
Einat Minkov | Kristina Toutanova | Hisami Suzuki
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
2006
NER Systems that Suit User’s Preferences: Adjusting the Recall-Precision Trade-off for Entity Extraction
Einat Minkov | Richard Wang | Anthony Tomasic | William Cohen
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Einat Minkov | Richard Wang | Anthony Tomasic | William Cohen
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
A Graphical Framework for Contextual Search and Name Disambiguation in Email
Einat Minkov | William Cohen | Andrew Ng
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing
Einat Minkov | William Cohen | Andrew Ng
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing