2023
pdf
bib
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Ali Hürriyetoğlu
|
Hristo Tanev
|
Vanni Zavarella
|
Reyyan Yeniterzi
|
Erdem Yörük
|
Milena Slavcheva
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
pdf
bib
abs
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2023): Workshop and Shared Task Report
Ali Hürriyetoğlu
|
Hristo Tanev
|
Osman Mutlu
|
Surendrabikram Thapa
|
Fiona Anting Tan
|
Erdem Yörük
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
We provide a summary of the sixth edition of the CASE workshop that is held in the scope of RANLP 2023. The workshop consists of regular papers, three keynotes, working papers of shared task participants, and shared task overview papers. This workshop series has been bringing together all aspects of event information collection across technical and social science fields. In addition to contributing to the progress in text based event extraction, the workshop provides a space for the organization of a multimodal event information collection task.
2022
pdf
bib
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
Ali Hürriyetoğlu
|
Hristo Tanev
|
Vanni Zavarella
|
Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
pdf
bib
abs
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022): Workshop and Shared Task Report
Ali Hürriyetoğlu
|
Hristo Tanev
|
Vanni Zavarella
|
Reyyan Yeniterzi
|
Osman Mutlu
|
Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP 2022. The workshop consists of regular papers, two keynotes, working papers of shared task participants, and task overview papers. This workshop has been bringing together all aspects of event information collection across technical and social science fields. In addition to the progress in depth, the submission and acceptance of multimodal approaches show the widening of this interdisciplinary research topic.
pdf
bib
abs
Extended Multilingual Protest News Detection - Shared Task 1, CASE 2021 and 2022
Ali Hürriyetoğlu
|
Osman Mutlu
|
Fırat Duruşan
|
Onur Uca
|
Alaeddin Gürel
|
Benjamin J. Radford
|
Yaoyao Dai
|
Hansi Hettiarachchi
|
Niklas Stoehr
|
Tadashi Nomoto
|
Milena Slavcheva
|
Francielle Vargas
|
Aaqib Javid
|
Fatih Beyhan
|
Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese & Subtask 4 English.
2021
pdf
bib
abs
Multilingual Protest News Detection - Shared Task 1, CASE 2021
Ali Hürriyetoğlu
|
Osman Mutlu
|
Erdem Yörük
|
Farhana Ferdousi Liza
|
Ritesh Kumar
|
Shyam Ratan
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)
Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of these datasets are of the utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (subtask 3), and event extraction (subtask 4). All subtasks had English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language was available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are above 77.27 F1-macro for subtask 1, above 85.32 F1-macro for subtask 2, above 84.23 CoNLL 2012 average score for subtask 3, and above 66.20 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios.
2020
pdf
bib
abs
COVCOR20 at WNUT-2020 Task 2: An Attempt to Combine Deep Learning and Expert rules
Ali Hürriyetoğlu
|
Ali Safaya
|
Osman Mutlu
|
Nelleke Oostdijk
|
Erdem Yörük
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
In the scope of WNUT-2020 Task 2, we developed various text classification systems, using deep learning models and one using linguistically informed rules. While both of the deep learning systems outperformed the system using the linguistically informed rules, we found that through the integration of (the output of) the three systems a better performance could be achieved than the standalone performance of each approach in a cross-validation setting. However, on the test data the performance of the integration was slightly lower than our best performing deep learning model. These results hardly indicate any progress in line of integrating machine learning and expert rules driven systems. We expect that the release of the annotation manuals and gold labels of the test data after this workshop will shed light on these perplexing results.
pdf
bib
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020
Ali Hürriyetoğlu
|
Erdem Yörük
|
Vanni Zavarella
|
Hristo Tanev
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020
pdf
bib
abs
Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report
Ali Hürriyetoğlu
|
Vanni Zavarella
|
Hristo Tanev
|
Erdem Yörük
|
Ali Safaya
|
Osman Mutlu
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020
We describe our effort on automated extraction of socio-political events from news in the scope of a workshop and a shared task we organized at Language Resources and Evaluation Conference (LREC 2020). We believe the event extraction studies in computational linguistics and social and political sciences should further support each other in order to enable large scale socio-political event information collection across sources, countries, and languages. The event consists of regular research papers and a shared task, which is about event sentence coreference identification (ESCI), tracks. All submissions were reviewed by five members of the program committee. The workshop attracted research papers related to evaluation of machine learning methodologies, language resources, material conflict forecasting, and a shared task participation report in the scope of socio-political event information collection. It has shown us the volume and variety of both the data sources and event information collection approaches related to socio-political events and the need to fill the gap between automated text processing techniques and requirements of social and political sciences.
2016
pdf
bib
Towards Building a Political Protest Database to Explain Changes in the Welfare State
Çağıl Sönmez
|
Arzucan Özgür
|
Erdem Yörük
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities