Freddy Heppell


2024

pdf bib
Multilinguality in the VIGILANT project
Brendan Spillane | Carolina Scarton | Robert Moro | Petar Ivanov | Andrey Tagarev | Jakub Simko | Ibrahim Abu Farha | Gary Munnelly | Filip Uhlárik | Freddy Heppell
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)

VIGILANT (Vital IntelliGence to Investigate ILlegAl DisiNformaTion) is a three-year Horizon Europe project that will equip European Law Enforcement Agencies (LEAs) with advanced disinformation detection and analysis tools to investigate and prevent criminal activities linked to disinformation. These include disinformation instigating violence towards minorities, promoting false medical cures, and increasing tensions between groups causing civil unrest and violent acts. VIGILANT’s four LEAs require support for English, Spanish, Catalan, Greek, Estonian, Romanian and Russian. Therefore, multilinguality is a major challenge and we present the current status of our tools and our plans to improve their performance.

2023

pdf bib
SheffieldVeraAI at SemEval-2023 Task 3: Mono and Multilingual Approaches for News Genre, Topic and Persuasion Technique Classification
Ben Wu | Olesya Razuvayevskaya | Freddy Heppell | João A. Leite | Carolina Scarton | Kalina Bontcheva | Xingyi Song
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our approach for SemEval- 2023 Task 3: Detecting the category, the fram- ing, and the persuasion techniques in online news in a multilingual setup. For Subtask 1 (News Genre), we propose an ensemble of fully trained and adapter mBERT models which was ranked joint-first for German, and had the high- est mean rank of multi-language teams. For Subtask 2 (Framing), we achieved first place in 3 languages, and the best average rank across all the languages, by using two separate ensem- bles: a monolingual RoBERTa-MUPPETLARGE and an ensemble of XLM-RoBERTaLARGE with adapters and task adaptive pretraining. For Sub- task 3 (Persuasion Techniques), we trained a monolingual RoBERTa-Base model for English and a multilingual mBERT model for the re- maining languages, which achieved top 10 for all languages, including 2nd for English. For each subtask, we compared monolingual and multilingual approaches, and considered class imbalance techniques.

pdf bib
Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study
Freddy Heppell | Kalina Bontcheva | Carolina Scarton
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn.world) and WarOnFakes (waronfakes.com), which publish content in Arabic, Chinese, English, French, German, and Spanish. We describe our content acquisition methodology and perform cross-site unsupervised topic clustering on the resulting multilingual dataset. We also perform linguistic and temporal analysis of the web page translations and topics over time, and investigate articles with false publication dates. We make publicly available this new dataset of 14,053 articles, annotated with each language version, and additional metadata such as links and images. The main contribution of this paper for the NLP community is in the novel dataset which enables studies of disinformation networks, and the training of NLP tools for disinformation detection.