Oguzhan Ozcelik


pdf bib
MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection
Cagri Toraman | Oguzhan Ozcelik | Furkan Sahinuc | Fazli Can
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection.

pdf bib
ARC-NLP at ClimateActivism 2024: Stance and Hate Speech Detection by Generative and Encoder Models Optimized with Tweet-Specific Elements
Ahmet Kaya | Oguzhan Ozcelik | Cagri Toraman
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Social media users often express hate speech towards specific targets and may either support or refuse activist movements. The automated detection of hate speech, which involves identifying both targets and stances, plays a critical role in event identification to mitigate its negative effects. In this paper, we present our methods for three subtasks of the Climate Activism Stance and Hate Event Detection Shared Task at CASE 2024. For each subtask (i) hate speech identification (ii) targets of hate speech identification (iii) stance detection, we experiment with optimized Transformer-based architectures that focus on tweet-specific features such as hashtags, URLs, and emojis. Furthermore, we investigate generative large language models, such as Llama2, using specific prompts for the first two subtasks. Our experiments demonstrate better performance of our models compared to baseline models in each subtask. Our solutions also achieve third, fourth, and first places respectively in the subtasks.


pdf bib
ARC-NLP at Multimodal Hate Speech Event Detection 2023: Multimodal Methods Boosted by Ensemble Learning, Syntactical and Entity Features
Umitcan Sahin | Izzet Emre Kucukkaya | Oguzhan Ozcelik | Cagri Toraman
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

Text-embedded images can serve as a means of spreading hate speech, propaganda, and extremist beliefs. Throughout the Russia-Ukraine war, both opposing factions heavily relied on text-embedded images as a vehicle for spreading propaganda and hate speech. Ensuring the effective detection of hate speech and propaganda is of utmost importance to mitigate the negative effect of hate speech dissemination. In this paper, we outline our methodologies for two subtasks of Multimodal Hate Speech Event Detection 2023. For the first subtask, hate speech detection, we utilize multimodal deep learning models boosted by ensemble learning and syntactical text attributes. For the second subtask, target detection, we employ multimodal deep learning models boosted by named entity features. Through experimentation, we demonstrate the superior performance of our models compared to all textual, visual, and text-visual baselines employed in multimodal hate speech detection. Furthermore, our models achieve the first place in both subtasks on the final leaderboard of the shared task.

pdf bib
Cross-Lingual Transfer Learning for Misinformation Detection: Investigating Performance Across Multiple Languages
Oguzhan Ozcelik | Arda Sarp Yenicesu | Onur Yildirim | Dilruba Sultan Haliloglu | Erdem Ege Eroglu | Fazli Can
Proceedings of the 4th Conference on Language, Data and Knowledge


pdf bib
ARC-NLP at CASE 2022 Task 1: Ensemble Learning for Multilingual Protest Event Detection
Umitcan Sahin | Oguzhan Ozcelik | Izzet Emre Kucukkaya | Cagri Toraman
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

Automated socio-political protest event detection is a challenging task when multiple languages are considered. In CASE 2022 Task 1, we propose ensemble learning methods for multilingual protest event detection in four subtasks with different granularity levels from document-level to entity-level. We develop an ensemble of fine-tuned Transformer-based language models, along with a post-processing step to regularize the predictions of our ensembles. Our approach places the first place in 6 out of 16 leaderboards organized in seven languages including English, Mandarin, and Turkish.