Hristo Tanev

Also published as: Hristo Tannev

2025

Challenges and Applications of Automated Extraction of Socio-political Events at the age of Large Language Models
Surendrabikram Thapa | Surabhi Adhikari | Hristo Tanev | Ali Hurriyetoglu
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

Socio-political event extraction (SPE) enables automated identification of critical events such as protests, conflicts, and policy shifts from unstructured text. As a foundational tool for journalism, social science research, and crisis response, SPE plays a key role in understanding complex global dynamics. The emergence of large language models (LLMs) like GPT-4 and LLaMA offers new opportunities for flexible, multilingual, and zero-shot SPE. However, applying LLMs to this domain introduces significant risks, including hallucinated outputs, lack of transparency, geopolitical bias, and potential misuse in surveillance or censorship. This position paper critically examines the promises and pitfalls of LLM-driven SPE, drawing on recent datasets and benchmarks. We argue that SPE is a high-stakes application requiring rigorous ethical scrutiny, interdisciplinary collaboration, and transparent design practices. We propose a research agenda focused on reproducibility, participatory development, and building systems that align with democratic values and the rights of affected communities.

pdf bib abs

This paper presents the Shared Task on Multimodal Detection of Hate Speech, Humor, and Stance in Marginalized Socio-Political Movement Discourse, hosted at CASE 2025. The task is built on the PrideMM dataset, a curated collection of 5,063 text-embedded images related to the LGBTQ+ pride movement, annotated for four interrelated subtasks: (A) Hate Speech Detection, (B) Hate Target Classification, (C) Topical Stance Classification, and (D) Intended Humor Detection. Eighty-nine teams registered, with competitive submissions across all subtasks. The results show that multimodal approaches consistently outperform unimodal baselines, particularly for hate speech detection, while fine-grained tasks such as target identification and stance classification remain challenging due to label imbalance, multimodal ambiguity, and implicit or culturally specific content. CLIP-based models and parameter-efficient fusion architectures achieved strong performance, showing promising directions for low-resource and efficient multimodal systems.

pdf bib abs

Exploring the Performance of Large Language Models for Event Detection and Extraction in the Health Domain
Hristo Tanev | Nicolas Stefanovitch | Tomáš Harmatha | Diana F. Sousa
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Large Language Models (LLM) have entered the world of NLP with a fast pace. LLM has been used for summarization, translation, named entity recognition, and sentiment analysis Recently, different research groups have experimented with event detection and extraction, using LLM at various levels of the processing stage: The LLM have proven to be a very relevant technology from data preparation to event argument extraction. In particular Open Source LLM like Mistral are very important since they can be shared and modified by the research community. Still, little effort was made to study the performance of these models in NLP tasks like event extraction. In this paper we describe an experiment in evaluating several state-of-the-art open large language models (LLM) for the task of event extraction and event detection in the domain of health. The models were prompted to perform detection of health-related events - mostly disease outbreaks, but also natural and man-made disasters, which directly or indirectly have impact on the health of the people. The models were also asked to extract the place, time, number of human and animal cases, and the number of the human fatalities. The performance of the LLM turned out to be better than the one of a state-of-the-art knowledge based system, using as test data a set of 800 news abstracts, containing the title and the lead sentences of health-related news articles. We compared the performance of the event detection and event argument extraction from the open Large Language Models and two knowledge based event extraction systems, NEXUS and Medical NEXUS. Our evaluation shows that all the open LLM show a superior performance w.r.t. the knowledge-based systems with the best improvement of the F1 score of number of human fatalities detection of 0.2 (0.84 vs. 0.64), where the best performing LLM was LLama 3.3 70B instruct.

pdf bib abs

Leveraging LLaMa for Abstractive Text Summarisation in Malayalam: An Experimental Study
Hristo Tanev | Anitha S. Pillai | Revathy V. R
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Recent years witnessed tremendous advancements in natural language processing (NLP) because of the development of complex language models that have automated several NLP applications, including text summarisation. Despite this progress, Malayalam text summarisation still faces challenges because of the peculiarities of the language. This research paper explores the potential of using a large language model, specifically the LLaMA (Large Language Model Meta AI) framework, for text summarisation of Malayalam language. In order to assess the performance of LLaMA for text summarization, for the low-resource language Malayalam, a dataset was curated with reference text and summaries. The evaluation showed that the LLaMA model could effectively summarize lengthy articles while maintaining important information and coherence. The generated summaries were compared with the reference summaries generated by human writers to observe how well aligned the model was with a human level of summarisation. The results proved that LLM can deal with the Malayalam text summarisation task, but more research is needed to understand the most relevant training strategy.

Hristo Tanev

2025

2024

2023

2022

2021

2020

2019

2017

2016

2015

2014

2013

2011

2008

2007

2006

2004

2002

2000

Co-authors

Venues