Kalina Bontcheva

Other people with similar names: Kalina Bontcheva, Katina Bontcheva

Unverified author pages with similar names: Kalina Bontcheva

2025

GateNLP at SemEval-2025 Task 10: Hierarchical Three-Step Prompting for Multilingual Narrative Classification
Iknoor Singh | Carolina Scarton | Kalina Bontcheva
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

The proliferation of online news and the increasing spread of misinformation necessitate robust methods for automated narrative classification. This paper presents our approach to SemEval 2025 Task 10 Subtask 2, which aims to classify news articles into a predefined two-level taxonomy of main narratives and sub-narratives across multiple languages. We propose Hierarchical Three-Step Prompting (H3Prompt) for multilingual narrative classification. Our methodology follows a three-step prompting strategy, where the model first categorises an article into one of two domains (Ukraine-Russia War or Climate Change), then identifies the most relevant main narratives, and finally assigns sub-narratives. Our approach secured the top position on the English test set among 28 competing teams worldwide. This result highlights the effectiveness of our method in improving narrative classification performance over the baselines.

pdf bib abs

SheffieldGATE at SemEval-2025 Task 2: Multi-Stage Reasoning with Knowledge Fusion for Entity Translation
Xinye Yang | Kalina Bontcheva | Xingyi Song
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes the machine translation system submitted to the SemEval-2025 Entity-Aware Machine Translation Task by the SheffieldGATE Team. We proposed a multi-agent entity-aware machine translation system that operates through three distinct reasoning stages: entity recognition, knowledge enhancement, and translation decision-making. The innovation in our approach lies in leveraging large language models to generate contextually relevant queries during the knowledge enhancement stage, extracting candidate entities and their translations from external knowledge bases. In the final translation decision-making stage, we employ fine-tuned large language models to denoise the retrieved knowledge, selecting the most relevant entity information to ensure accurate translation of the original text. Experimental results demonstrate our system’s effectiveness. In emEval-2025 Task 2, our system ranks first among all systems in Spanish entity translation metrics and third in Italian. For systems that do not use gold standard entity IDs during test set inference, ours achieves the highest overall scores across four language pairs: German, French, Italian, and Spanish.

pdf bib abs

Hostility Detection in UK Politics: A Dataset on Online Abuse Targeting MPs
Mugdha Pandya | Mali Jin | Kalina Bontcheva | Diana Maynard
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

Social media platforms, particularly X, enable direct interaction between politicians and constituents but also expose politicians to hostile responses targetting both their governmental role and personal identity. This online hostility can undermine public trust and potentially incite offline violence. While general hostility detection models exist, they lack the specificity needed for political contexts and country-specific issues. We address this gap by creating a dataset of 3,320 English tweets directed at UK Members of Parliament (MPs) over two years, annotated for hostility and targeted identity characteristics (race, gender, religion). Through linguistic and topical analyses, we examine the unique features of UK political discourse and evaluate pre-trained language models and large language models on binary hostility detection and multi-class targeted identity type classification tasks. Our work provides essential data and insights for studying politics-related hostility in the UK.

2024

pdf bib abs

Examining Temporalities on Stance Detection towards COVID-19 Vaccination
Yida Mu | Mali Jin | Kalina Bontcheva | Xingyi Song
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Previous studies have highlighted the importance of vaccination as an effective strategy to control the transmission of the COVID-19 virus. It is crucial for policymakers to have a comprehensive understanding of the public’s stance towards vaccination on a large scale. However, attitudes towards COVID-19 vaccination, such as pro-vaccine or vaccine hesitancy, have evolved over time on social media. Thus, it is necessary to account for possible temporal shifts when analysing these stances. This study aims to examine the impact of temporal concept drift on stance detection towards COVID-19 vaccination on Twitter. To this end, we evaluate a range of transformer-based models using chronological (splitting the training, validation, and test sets in order of time) and random splits (randomly splitting these three sets) of social media data. Our findings reveal significant discrepancies in model performance between random and chronological splits in several existing COVID-19-related datasets; specifically, chronological splits significantly reduce the accuracy of stance classification. Therefore, real-world stance detection approaches need to be further refined to incorporate temporal factors as a key consideration.

pdf bib abs

Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science
Yida Mu | Ben P. Wu | William Thorne | Ambrose Robinson | Nikolaos Aletras | Carolina Scarton | Kalina Bontcheva | Xingyi Song
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific prompts. However, due to the computational demands associated with training these models, their applications often adopt a zero-shot setting. In this paper, we evaluate the zero-shot performance of two publicly accessible LLMs, ChatGPT and OpenAssistant, in the context of six Computational Social Science classification tasks, while also investigating the effects of various prompting strategies. Our experiments investigate the impact of prompt complexity, including the effect of incorporating label definitions into the prompt; use of synonyms for label names; and the influence of integrating past memories during foundation model training. The findings indicate that in a zero-shot setting, current LLMs are unable to match the performance of smaller, fine-tuned baseline transformer models (such as BERT-large). Additionally, we find that different prompting strategies can significantly affect classification accuracy, with variations in accuracy and F1 scores exceeding 10%.

pdf bib abs

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets
Yida Mu | Xingyi Song | Kalina Bontcheva | Nikolaos Aletras
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source post as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main contribution of this paper is in the in-depth evaluation of the performance gap between content and context-based models specifically on detecting new, unseen rumors. Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors’ source post and tend to overlook the significant role that contextual information can play. We also study the effect of data split strategies on classifier performance. Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets during the training of rumor detection methods.

pdf bib abs

SheffieldVeraAI at SemEval-2024 Task 4: Prompting and fine-tuning a Large Vision-Language Model for Binary Classification of Persuasion Techniques in Memes
Charlie Grimshaw | Kalina Bontcheva | Xingyi Song
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our approach for SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes. Specifically, we concentrate on Subtask 2b, a binary classification challenge that entails categorizing memes as either “propagandistic” or “non-propagandistic”. To address this task, we utilized the large multimodal pretrained model, LLaVa. We explored various prompting strategies and fine-tuning methods, and observed that the model, when not fine-tuned but provided with a few-shot learning examples, achieved the best performance. Additionally, we enhanced the model’s multilingual capabilities by integrating a machine translation model. Our system secured the 2nd place in the Arabic language category.

pdf bib abs

Optimising LLM-Driven Machine Translation with Context-Aware Sliding Windows
Xinye Yang | Yida Mu | Kalina Bontcheva | Xingyi Song
Proceedings of the Ninth Conference on Machine Translation

This paper describes SheffieldGATE’s submission to WMT 2024 Chat Shared Translation Task. We participate in three language pairs: English-German, English-Dutch, and English-Portuguese (Brazil). In this work, we introduce a context-aware sliding window decoding method to track dependencies between chat messages. We fine-tune a large pre-trained language model based on the training data provided by the shared task Our experiments (i) compare the model performance between multilingual and bilingual fine-tuning and (ii) assess the impact of different window sizes. Our experimental results demonstrate that utilising contextual information yields superior performance in document-level translation compared to translating documents as isolated text segments, and that models fine-tuned with multilingual data perform better than those fine-tuned with bilingual data.

pdf bib abs

Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling
Yida Mu | Chun Dong | Kalina Bontcheva | Xingyi Song
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain drawbacks, such as the lack of semantic understanding and the presence of overlapping topics. In this work, we investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora. To this end, we introduce a framework that prompts LLMs to generate topics from a given set of documents and establish evaluation protocols to assess the clustering efficacy of LLMs. Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics. Through in-depth experiments and evaluation, we summarise the advantages and constraints of employing LLMs in topic extraction.

2023

pdf bib abs

SheffieldVeraAI at SemEval-2023 Task 3: Mono and Multilingual Approaches for News Genre, Topic and Persuasion Technique Classification
Ben Wu | Olesya Razuvayevskaya | Freddy Heppell | João A. Leite | Carolina Scarton | Kalina Bontcheva | Xingyi Song
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our approach for SemEval- 2023 Task 3: Detecting the category, the fram- ing, and the persuasion techniques in online news in a multilingual setup. For Subtask 1 (News Genre), we propose an ensemble of fully trained and adapter mBERT models which was ranked joint-first for German, and had the high- est mean rank of multi-language teams. For Subtask 2 (Framing), we achieved first place in 3 languages, and the best average rank across all the languages, by using two separate ensem- bles: a monolingual RoBERTa-MUPPETLARGE and an ensemble of XLM-RoBERTaLARGE with adapters and task adaptive pretraining. For Sub- task 3 (Persuasion Techniques), we trained a monolingual RoBERTa-Base model for English and a multilingual mBERT model for the re- maining languages, which achieved top 10 for all languages, including 2nd for English. For each subtask, we compared monolingual and multilingual approaches, and considered class imbalance techniques.

pdf bib abs

Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic
Ye Jiang | Xingyi Song | Carolina Scarton | Iknoor Singh | Ahmet Aker | Kalina Bontcheva
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

The spread of COVID-19 misinformation on social media became a major challenge for citizens, with negative real-life consequences. Prior research focused on detection and/or analysis of COVID-19 misinformation. However, fine-grained classification of misinformation claims has been largely overlooked. The novel contribution of this paper is in introducing a new dataset which makes fine-grained distinctions between statements that assert, comment or question on false COVID-19 claims. This new dataset not only enables social behaviour analysis but also enables us to address both evidence-based and non-evidence-based misinformation classification tasks. Lastly, through leave claim out cross-validation, we demonstrate that classifier performance on unseen COVID-19 misinformation claims is significantly different, as compared to performance on topics present in the training data.

pdf bib abs

GATE Teamware 2: An open-source tool for collaborative document classification annotation
David Wilby | Twin Karmakharm | Ian Roberts | Xingyi Song | Kalina Bontcheva
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

We present GATE Teamware 2: an open-source web-based platform for managing teams of annotators working on document classification tasks. GATE Teamware 2 is an entirely re-engineered successor to GATE Teamware, using contemporary web frameworks. The software allows the management of teams of multiple annotators, project managers and administrators - including the management of annotators - across multiple projects. Projects can be configured to control and monitor the annotation statistics and have a highly flexible JSON-configurable annotation display which can include arbitrary HTML. Optionally, documents can be uploaded with pre-existing annotations and documents are served to annotators in a random order by default to reduce bias. Crucially, annotators can be trained on applying the annotation guidelines correctly and then screened for quality assurance purposes, prior to being cleared for independent annotation. GATE Teamware 2 can be self-deployed, including in container orchestration environments, or provided as private, hosted cloud instances.GATE Teamware 2 is an open-source software and can be downloaded from https://github.com/GATENLP/gate-teamware.A demonstration video of the system has also been made available at https://youtu.be/KoXkuhc4fmM.

pdf bib abs

Classifying COVID-19 Vaccine Narratives
Yue Li | Carolina Scarton | Xingyi Song | Kalina Bontcheva
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Vaccine hesitancy is widespread, despite the government’s information campaigns and the efforts of the World Health Organisation (WHO). Categorising the topics within vaccine-related narratives is crucial to understand the concerns expressed in discussions and identify the specific issues that contribute to vaccine hesitancy. This paper addresses the need for monitoring and analysing vaccine narratives online by introducing a novel vaccine narrative classification task, which categorises COVID-19 vaccine claims into one of seven categories. Following a data augmentation approach, we first construct a novel dataset for this new classification task, focusing on the minority classes. We also make use of fact-checker annotated data. The paper also presents a neural vaccine narrative classifier that achieves an accuracy of 84% under cross-validation. The classifier is publicly available for researchers and journalists.