Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora

Dror Markus, Effi Levi, Tamir Sheafer, Shaul Shenhav


Abstract
Media storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their importance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We make available the resulting media storm dataset. Both the method and dataset provide a basis for comprehensive empirical study of media storms.
Anthology ID:
2024.findings-emnlp.275
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4786–4797
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.275
DOI:
Bibkey:
Cite (ACL):
Dror Markus, Effi Levi, Tamir Sheafer, and Shaul Shenhav. 2024. Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4786–4797, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora (Markus et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.275.pdf