Effi Levi


2024

pdf bib
Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora
Dror Kris Markus | Effi Levi | Tamir Sheafer | Shaul Rafael Shenhav
Findings of the Association for Computational Linguistics: EMNLP 2024

Media storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their importance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We make available the resulting media storm dataset. Both the method and dataset provide a basis for comprehensive empirical study of media storms.

pdf bib
Exploring Factual Entailment with NLI: A News Media Study
Guy Mor-Lan | Effi Levi
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)

We explore the relationship between factuality and Natural Language Inference (NLI) by introducing FactRel – a novel annotation scheme that models factual rather than textual entailment, and use it to annotate a dataset of naturally occurring sentences from news articles. Our analysis shows that 84% of factually supporting pairs and 63% of factually undermining pairs do not amount to NLI entailment or contradiction, respectively, suggesting that factual relationships are more apt for analyzing media discourse. We experiment with models for pairwise classification on the new dataset, and find that in some cases, generating synthetic data with GPT-4 on the basis of the annotated dataset can improve performance. Surprisingly, few-shot learning with GPT-4 yields strong results on par with medium LMs (DeBERTa) trained on the labelled dataset. We hypothesize that these results indicate the fundamental dependence of this task on both world knowledge and advanced reasoning abilities.

pdf bib
IsraParlTweet: The Israeli Parliamentary and Twitter Resource
Guy Mor-Lan | Effi Levi | Tamir Sheafer | Shaul R. Shenhav
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We introduce IsraParlTweet, a new linked corpus of Hebrew-language parliamentary discussions from the Knesset (Israeli Parliament) between the years 1992-2023 and Twitter posts made by Members of the Knesset between the years 2008-2023, containing a total of 294.5 million Hebrew tokens. In addition to raw text, the corpus contains comprehensive metadata on speakers and Knesset sessions as well as several linguistic annotations. As a result, IsraParlTweet can be used to conduct a wide variety of quantitative and qualitative analyses and provide valuable insights into political discourse in Israel.

2023

pdf bib
Factoring Hate Speech: A New Annotation Framework to Study Hate Speech in Social Media
Gal Ron | Effi Levi | Odelia Oshri | Shaul Shenhav
The 7th Workshop on Online Abuse and Harms (WOAH)

In this work we propose a novel annotation scheme which factors hate speech into five separate discursive categories. To evaluate our scheme, we construct a corpus of over 2.9M Twitter posts containing hateful expressions directed at Jews, and annotate a sample dataset of 1,050 tweets. We present a statistical analysis of the annotated dataset as well as discuss annotation examples, and conclude by discussing promising directions for future work.

2022

pdf bib
Detecting Narrative Elements in Informational Text
Effi Levi | Guy Mor | Tamir Sheafer | Shaul Shenhav
Findings of the Association for Computational Linguistics: NAACL 2022

Automatic extraction of narrative elements from text, combining narrative theories with computational models, has been receiving increasing attention over the last few years. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to informational texts, specifically news stories. We introduce NEAT (Narrative Elements AnnoTation) – a novel NLP task for detecting narrative elements in raw text. For this purpose, we designed a new multi-label narrative annotation scheme, better suited for informational text (e.g. news media), by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success). We then used this scheme to annotate a new dataset of 2,209 sentences, compiled from 46 news articles from various category domains. We trained a number of supervised models in several different setups over the annotated dataset to identify the different narrative elements, achieving an average F1 score of up to 0.77. The results demonstrate the holistic nature of our annotation scheme as well as its robustness to domain category.

2016

pdf bib
Edge-Linear First-Order Dependency Parsing with Undirected Minimum Spanning Tree Inference
Effi Levi | Roi Reichart | Ari Rappoport
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
How Well Do Distributional Models Capture Different Types of Semantic Knowledge?
Dana Rubinstein | Effi Levi | Roy Schwartz | Ari Rappoport
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)