%0 Conference Proceedings %T Discovering Black Lives Matter Events in the United States: Shared Task 3, CASE 2021 %A Giorgi, Salvatore %A Zavarella, Vanni %A Tanev, Hristo %A Stefanovitch, Nicolas %A Hwang, Sy %A Hettiarachchi, Hansi %A Ranasinghe, Tharindu %A Kalyan, Vivek %A Tan, Paul %A Tan, Shaun %A Andrews, Martin %A Hu, Tiancheng %A Stoehr, Niklas %A Re, Francesco Ignazio %A Vegh, Daniel %A Atzenhofer, Dennis %A Curtis, Brenda %A Hürriyetoğlu, Ali %Y Hürriyetoğlu, Ali %S Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021) %D 2021 %8 August %I Association for Computational Linguistics %C Online %F giorgi-etal-2021-discovering %X Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently. But, the ability to both (1) extract events “in the wild” from text and (2) properly evaluate event detection systems has potential to support a wide variety of tasks such as monitoring the activity of socio-political movements, examining media coverage and public support of these movements, and informing policy decisions. Therefore, we study performance of the best event detection systems on detecting Black Lives Matter (BLM) events from tweets and news articles. The murder of George Floyd, an unarmed Black man, at the hands of police officers received global attention throughout the second half of 2020. Protests against police violence emerged worldwide and the BLM movement, which was once mostly regulated to the United States, was now seeing activity globally. This shared task asks participants to identify BLM related events from large unstructured data sources, using systems pretrained to extract socio-political events from text. We evaluate several metrics, accessing each system’s ability to identify protest events both temporally and spatially. Results show that identifying daily protest counts is an easier task than classifying spatial and temporal protest trends simultaneously, with maximum performance of 0.745 and 0.210 (Pearson r), respectively. Additionally, all baselines and participant systems suffered from low recall, with a maximum recall of 5.08. %R 10.18653/v1/2021.case-1.27 %U https://aclanthology.org/2021.case-1.27 %U https://doi.org/10.18653/v1/2021.case-1.27 %P 218-227