Shaul Rafael Shenhav
2024
Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora
Dror Kris Markus
|
Effi Levi
|
Tamir Sheafer
|
Shaul Rafael Shenhav
Findings of the Association for Computational Linguistics: EMNLP 2024
Media storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their importance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We make available the resulting media storm dataset. Both the method and dataset provide a basis for comprehensive empirical study of media storms.