Jelena Mitrović

2025

Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration
Ramona Kühn | Jelena Mitrović | Michael Granitzer
Proceedings of the 31st International Conference on Computational Linguistics

Rhetorical figures play an important role in our communication. They are used to convey subtle, implicit meaning, or to emphasize statements. We notice them in hate speech, fake news, and propaganda. By improving the systems for computational detection of rhetorical figures, we can also improve tasks such as hate speech and fake news detection, sentiment analysis, opinion mining, or argument mining. Unfortunately, there is a lack of annotated data, as well as qualified annotators that would help us build large corpora to train machine learning models for the detection of rhetorical figures. The situation is particularly difficult in languages other than English, and for rhetorical figures other than metaphor, sarcasm, and irony. To overcome this issue, we develop a web application called “Find your Figure” that facilitates the identification and annotation of German rhetorical figures. The application is based on the German Rhetorical ontology GRhOOT which we have specially adapted for this purpose. In addition, we improve the user experience with Retrieval Augmented Generation (RAG). In this paper, we present the restructuring of the ontology, the development of the web application, and the built-in RAG pipeline. We also identify the optimal RAG settings for our application. Our approach is one of the first to practically use rhetorical ontologies in combination with RAG and shows promising results.

2024

pdf bib abs

Using Pre-Trained Language Models in an End-to-End Pipeline for Antithesis Detection
Ramona Kühn | Khouloud Saadi | Jelena Mitrović | Michael Granitzer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Rhetorical figures play an important role in influencing readers and listeners. Some of these word constructs that deviate from the usual language structure are known to be persuasive – antithesis is one of them. This figure combines parallel phrases with opposite ideas or words to highlight a contradiction. By identifying this figure, persuasive actors can be better identified. For this task, we create an annotated German dataset for antithesis detection. The dataset consists of posts from a Telegram channel criticizing the COVID-19 politics in Germany. Furthermore, we propose a three-block pipeline approach to detect the figure antithesis using large language models. Our pipeline splits the text into phrases, identifies phrases with a syntactically parallel structure, and detects if these parallel phrase pairs present opposing ideas by fine-tuning the German ELECTRA model, a state-of-the-art deep learning model for the German language. Furthermore, we compare the results with multilingual BERT and German BERT. Our novel approach outperforms the state-of-the-art methods (F1-score of 50.43 %) for antithesis detection by achieving an F1-score of 65.11 %.

pdf bib abs

The Elephant in the Room: Ten Challenges of Computational Detection of Rhetorical Figures
Ramona Kühn | Jelena Mitrović
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)

Computational detection of rhetorical figures focuses mostly on figures such as metaphor, irony, or analogy. However, there exist many more figures that are neither less important nor less prevalent. We wanted to pinpoint the reasons why researchers often avoid other figures and to shed light on the challenges they struggle with when investigating those figures. In this comprehensive survey, we analyzed over 40 papers dealing with the computational detection of rhetorical figures other than metaphor, simile, sarcasm, and irony. We encountered recurrent challenges from which we compiled a ten point list. Furthermore, we suggest solutions for each challenge to encourage researchers to investigate a greater variety of rhetorical figures.

2023

pdf bib abs

Learn From One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation
Khouloud Saadi | Jelena Mitrović | Michael Granitzer
Findings of the Association for Computational Linguistics: EMNLP 2023

Knowledge distillation is known as an effective technique for compressing over-parameterized language models. In this work, we propose to break down the global feature distillation task into N local sub-tasks. In this new framework, we consider each neuron in the last hidden layer of the teacher network as a specialized sub-teacher. We also consider each neuron in the last hidden layer of the student network as a focused sub-student. We make each focused sub-student learn from one corresponding specialized sub-teacher and ignore the others. This will facilitate the task for the sub-student and keep it focused. Our proposed method is novel and can be combined with other distillation techniques. Empirical results show that our proposed approach outperforms the state-of-the-art methods by maintaining higher performance on most benchmark datasets. Furthermore, we propose a randomized variant of our approach, called Masked One-to-One Mapping. Rather than learning all the N sub-tasks simultaneously, we focus on learning a subset of these sub-tasks at each optimization step. This variant enables the student to digest the received flow of knowledge more effectively and yields superior results.

pdf bib abs

Hidden in Plain Sight: Can German Wiktionary and Wordnets Facilitate the Detection of Antithesis?
Ramona Kuehn | Jelena Mitrović | Michael Granitzer
Proceedings of the 12th Global Wordnet Conference

Existing wordnets mainly focus on synonyms, while antonyms have often been neglected, especially in wordnets in languages other than English. In this paper, we show how regular expressions are used to generate an antonym resource for German by using Wiktionary as a source. This resource contains antonyms for 45499 words. The antonyms can be used to extend existing wordnets. We show that this is important by comparing our antonym resource to the antonyms in OdeNet, the only freely available German wordnet that contains antonyms for 3059 words. We demonstrate that antonyms are relevant for the detection of the rhetorical figure antithesis. This figure has been known to influence the audience by creating contradiction and using a parallel sentence structure combined with antonyms. We first detect parallelism with part-of-speech tags and then apply our rule-based antithesis detection algorithm to a dataset of the messenger service Telegram. We evaluate our approach and achieve a precision of 57% and a recall of 45% thus overcoming the existing approaches.

2022

pdf bib abs

GRhOOT: Ontology of Rhetorical Figures in German
Ramona Kühn | Jelena Mitrović | Michael Granitzer
Proceedings of the Thirteenth Language Resources and Evaluation Conference

GRhOOT, the German RhetOrical OnTology, is a domain ontology of 110 rhetorical figures in the German language. The overall goal of building an ontology of rhetorical figures in German is not only the formal representation of different rhetorical figures, but also allowing for their easier detection, thus improving sentiment analysis, argument mining, detection of hate speech and fake news, machine translation, and many other tasks in which recognition of non-literal language plays an important role. The challenge of building such ontologies lies in classifying the figures and assigning adequate characteristics to group them, while considering their distinctive features. The ontology of rhetorical figures in the Serbian language was used as a basis for our work. Besides transferring and extending the concepts of the Serbian ontology, we ensured completeness and consistency by using description logic and SPARQL queries. Furthermore, we show a decision tree to identify figures and suggest a usage scenario on how the ontology can be utilized to collect and annotate data.

2021

pdf bib

pdf bib abs

HateBERT: Retraining BERT for Abusive Language Detection in English
Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

We introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have curated and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the retrained version on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the fine-tuned models across the datasets, suggesting that portability is affected by compatibility of the annotated phenomena.

2020

pdf bib abs

nlpUP at SemEval-2020 Task 12 : A Blazing Fast System for Offensive Language Detection
Ehab Hamdy | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we introduce our submission for the SemEval Task 12, sub-tasks A and B for offensive language identification and categorization in English tweets. This year the data set for Task A is significantly larger than in the previous year. Therefore, we have adapted the BlazingText algorithm to extract embedding representation and classify texts after filtering and sanitizing the dataset according to the conventional text patterns on social media. We have gained both advantages of a speedy training process and obtained a good F1 score of 90.88% on the test set. For sub-task B, we opted to fine-tune a Bidirectional Encoder Representation from a Transformer (BERT) to accommodate the limited data for categorizing offensive tweets. We have achieved an F1 score of only 56.86%, but after experimenting with various label assignment thresholds in the pre-processing steps, the F1 score improved to 64%.

pdf bib

pdf bib abs

I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language
Tommaso Caselli | Valerio Basile | Jelena Mitrović | Inga Kartoziya | Michael Granitzer
Proceedings of the Twelfth Language Resources and Evaluation Conference

Abusive language detection is an unsolved and challenging problem for the NLP community. Recent literature suggests various approaches to distinguish between different language phenomena (e.g., hate speech vs. cyberbullying vs. offensive language) and factors (degree of explicitness and target) that may help to classify different abusive language phenomena. There are data sets that annotate the target of abusive messages (i.e.OLID/OffensEval (Zampieri et al., 2019a)). However, there is a lack of data sets that take into account the degree of explicitness. In this paper, we propose annotation guidelines to distinguish between explicit and implicit abuse in English and apply them to OLID/OffensEval. The outcome is a newly created resource, AbuseEval v1.0, which aims to address some of the existing issues in the annotation of offensive and abusive language (e.g., explicitness of the message, presence of a target, need of context, and interaction across different phenomena).

pdf bib abs

GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models
Davide Colla | Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR).

pdf bib abs

Multi-word Expressions for Abusive Speech Detection in Serbian
Ranka Stanković | Jelena Mitrović | Danka Jokić | Cvetana Krstev
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

This paper presents our work on the refinement and improvement of the Serbian language part of Hurtlex, a multilingual lexicon of words to hurt. We pay special attention to adding Multi-word expressions that can be seen as abusive, as such lexical entries are very important in obtaining good results in a plethora of abusive language detection tasks. We use Serbian morphological dictionaries as a basis for data cleaning and MWE dictionary creation. A connection to other lexical and semantic resources in Serbian is outlined and building of abusive language detection systems based on that connection is foreseen.

pdf bib abs

NLP_Passau at SemEval-2020 Task 12: Multilingual Neural Network for Offensive Language Detection in English, Danish and Turkish
Omar Hussein | Hachem Sfar | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes a neural network (NN) model that was used for participating in the OffensEval, Task 12 of the SemEval 2020 workshop. The aim of this task is to identify offensive speech in social media, particularly in tweets. The model we used, C-BiGRU, is composed of a Convolutional Neural Network (CNN) along with a bidirectional Recurrent Neural Network (RNN). A multidimensional numerical representation (embedding) for each of the words in the tweets that were used by the model were determined using fastText. This allowed for using a dataset of labeled tweets to train the model on detecting combinations of words that may convey an offensive meaning. This model was used in the sub-task A of the English, Turkish and Danish competitions of the workshop, achieving F1 scores of 90.88%, 76.76% and 76.70%, respectively.

pdf bib abs

Language Proficiency Scoring
Cristina Arhiliuc | Jelena Mitrović | Michael Granitzer
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Common European Framework of Reference (CEFR) provides generic guidelines for the evaluation of language proficiency. Nevertheless, for automated proficiency classification systems, different approaches for different languages are proposed. Our paper evaluates and extends the results of an approach to Automatic Essay Scoring proposed as a part of the REPROLANG 2020 challenge. We provide a comparison between our results and the ones from the published paper and we include a new corpus for the English language for further experiments. Our results are lower than the expected ones when using the same approach and the system does not scale well with the added English corpus.

2019

pdf bib

Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Agata Savary | Carla Parra Escartín | Francis Bond | Jelena Mitrović | Verginica Barbu Mititelu
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

pdf bib abs

nlpUP at SemEval-2019 Task 6: A Deep Neural Language Model for Offensive Language Detection
Jelena Mitrović | Bastian Birkeneder | Michael Granitzer
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents our submission for the SemEval shared task 6, sub-task A on the identification of offensive language. Our proposed model, C-BiGRU, combines a Convolutional Neural Network (CNN) with a bidirectional Recurrent Neural Network (RNN). We utilize word2vec to capture the semantic similarities between words. This composition allows us to extract long term dependencies in tweets and distinguish between offensive and non-offensive tweets. In addition, we evaluate our approach on a different dataset and show that our model is capable of detecting online aggressiveness in both English and German tweets. Our model achieved a macro F1-score of 79.40% on the SemEval dataset.

2016

pdf bib abs

A Language-independent Model for Introducing a New Semantic Relation Between Adjectives and Nouns in a WordNet
Miljana Mladenović | Jelena Mitrović | Cvetana Krstev
Proceedings of the 8th Global WordNet Conference (GWC)

The aim of this paper is to show a language-independent process of creating a new semantic relation between adjectives and nouns in wordnets. The existence of such a relation is expected to improve the detection of figurative language and sentiment analysis (SA). The proposed method uses an annotated corpus to explore the semantic knowledge contained in linguistic constructs performing as the rhetorical figure Simile. Based on the frequency of occurrence of similes in an annotated corpus, we propose a new relation, which connects the noun synset with the synset of an adjective representing that noun’s specific attribute. We elaborate on adding this new relation in the case of the Serbian WordNet (SWN). The proposed method is evaluated by human judgement in order to determine the relevance of automatically selected relation items. The evaluation has shown that 84% of the automatically selected and the most frequent linguistic constructs, whose frequency threshold was equal to 3, were also selected by humans.