Jelena Mitrović


2022

pdf bib
GRhOOT: Ontology of Rhetorical Figures in German
Ramona Kühn | Jelena Mitrović | Michael Granitzer
Proceedings of the Thirteenth Language Resources and Evaluation Conference

GRhOOT, the German RhetOrical OnTology, is a domain ontology of 110 rhetorical figures in the German language. The overall goal of building an ontology of rhetorical figures in German is not only the formal representation of different rhetorical figures, but also allowing for their easier detection, thus improving sentiment analysis, argument mining, detection of hate speech and fake news, machine translation, and many other tasks in which recognition of non-literal language plays an important role. The challenge of building such ontologies lies in classifying the figures and assigning adequate characteristics to group them, while considering their distinctive features. The ontology of rhetorical figures in the Serbian language was used as a basis for our work. Besides transferring and extending the concepts of the Serbian ontology, we ensured completeness and consistency by using description logic and SPARQL queries. Furthermore, we show a decision tree to identify figures and suggest a usage scenario on how the ontology can be utilized to collect and annotate data.

2021

pdf bib
HateBERT: Retraining BERT for Abusive Language Detection in English
Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

We introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have curated and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the retrained version on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the fine-tuned models across the datasets, suggesting that portability is affected by compatibility of the annotated phenomena.

pdf bib
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)
Paul Cook | Jelena Mitrović | Carla Parra Escartín | Ashwini Vaidya | Petya Osenova | Shiva Taslimipoor | Carlos Ramisch
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

2020

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Stella Markantonatou | John McCrae | Jelena Mitrović | Carole Tiberius | Carlos Ramisch | Ashwini Vaidya | Petya Osenova | Agata Savary
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

pdf bib
Multi-word Expressions for Abusive Speech Detection in Serbian
Ranka Stanković | Jelena Mitrović | Danka Jokić | Cvetana Krstev
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

This paper presents our work on the refinement and improvement of the Serbian language part of Hurtlex, a multilingual lexicon of words to hurt. We pay special attention to adding Multi-word expressions that can be seen as abusive, as such lexical entries are very important in obtaining good results in a plethora of abusive language detection tasks. We use Serbian morphological dictionaries as a basis for data cleaning and MWE dictionary creation. A connection to other lexical and semantic resources in Serbian is outlined and building of abusive language detection systems based on that connection is foreseen.

pdf bib
Language Proficiency Scoring
Cristina Arhiliuc | Jelena Mitrović | Michael Granitzer
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Common European Framework of Reference (CEFR) provides generic guidelines for the evaluation of language proficiency. Nevertheless, for automated proficiency classification systems, different approaches for different languages are proposed. Our paper evaluates and extends the results of an approach to Automatic Essay Scoring proposed as a part of the REPROLANG 2020 challenge. We provide a comparison between our results and the ones from the published paper and we include a new corpus for the English language for further experiments. Our results are lower than the expected ones when using the same approach and the system does not scale well with the added English corpus.

pdf bib
I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language
Tommaso Caselli | Valerio Basile | Jelena Mitrović | Inga Kartoziya | Michael Granitzer
Proceedings of the Twelfth Language Resources and Evaluation Conference

Abusive language detection is an unsolved and challenging problem for the NLP community. Recent literature suggests various approaches to distinguish between different language phenomena (e.g., hate speech vs. cyberbullying vs. offensive language) and factors (degree of explicitness and target) that may help to classify different abusive language phenomena. There are data sets that annotate the target of abusive messages (i.e.OLID/OffensEval (Zampieri et al., 2019a)). However, there is a lack of data sets that take into account the degree of explicitness. In this paper, we propose annotation guidelines to distinguish between explicit and implicit abuse in English and apply them to OLID/OffensEval. The outcome is a newly created resource, AbuseEval v1.0, which aims to address some of the existing issues in the annotation of offensive and abusive language (e.g., explicitness of the message, presence of a target, need of context, and interaction across different phenomena).

pdf bib
GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models
Davide Colla | Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR).

pdf bib
NLP_Passau at SemEval-2020 Task 12: Multilingual Neural Network for Offensive Language Detection in English, Danish and Turkish
Omar Hussein | Hachem Sfar | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes a neural network (NN) model that was used for participating in the OffensEval, Task 12 of the SemEval 2020 workshop. The aim of this task is to identify offensive speech in social media, particularly in tweets. The model we used, C-BiGRU, is composed of a Convolutional Neural Network (CNN) along with a bidirectional Recurrent Neural Network (RNN). A multidimensional numerical representation (embedding) for each of the words in the tweets that were used by the model were determined using fastText. This allowed for using a dataset of labeled tweets to train the model on detecting combinations of words that may convey an offensive meaning. This model was used in the sub-task A of the English, Turkish and Danish competitions of the workshop, achieving F1 scores of 90.88%, 76.76% and 76.70%, respectively.

pdf bib
nlpUP at SemEval-2020 Task 12 : A Blazing Fast System for Offensive Language Detection
Ehab Hamdy | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we introduce our submission for the SemEval Task 12, sub-tasks A and B for offensive language identification and categorization in English tweets. This year the data set for Task A is significantly larger than in the previous year. Therefore, we have adapted the BlazingText algorithm to extract embedding representation and classify texts after filtering and sanitizing the dataset according to the conventional text patterns on social media. We have gained both advantages of a speedy training process and obtained a good F1 score of 90.88% on the test set. For sub-task B, we opted to fine-tune a Bidirectional Encoder Representation from a Transformer (BERT) to accommodate the limited data for categorizing offensive tweets. We have achieved an F1 score of only 56.86%, but after experimenting with various label assignment thresholds in the pre-processing steps, the F1 score improved to 64%.

2019

pdf bib
nlpUP at SemEval-2019 Task 6: A Deep Neural Language Model for Offensive Language Detection
Jelena Mitrović | Bastian Birkeneder | Michael Granitzer
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents our submission for the SemEval shared task 6, sub-task A on the identification of offensive language. Our proposed model, C-BiGRU, combines a Convolutional Neural Network (CNN) with a bidirectional Recurrent Neural Network (RNN). We utilize word2vec to capture the semantic similarities between words. This composition allows us to extract long term dependencies in tweets and distinguish between offensive and non-offensive tweets. In addition, we evaluate our approach on a different dataset and show that our model is capable of detecting online aggressiveness in both English and German tweets. Our model achieved a macro F1-score of 79.40% on the SemEval dataset.

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Agata Savary | Carla Parra Escartín | Francis Bond | Jelena Mitrović | Verginica Barbu Mititelu
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

2016

pdf bib
A Language-independent Model for Introducing a New Semantic Relation Between Adjectives and Nouns in a WordNet
Miljana Mladenović | Jelena Mitrović | Cvetana Krstev
Proceedings of the 8th Global WordNet Conference (GWC)

The aim of this paper is to show a language-independent process of creating a new semantic relation between adjectives and nouns in wordnets. The existence of such a relation is expected to improve the detection of figurative language and sentiment analysis (SA). The proposed method uses an annotated corpus to explore the semantic knowledge contained in linguistic constructs performing as the rhetorical figure Simile. Based on the frequency of occurrence of similes in an annotated corpus, we propose a new relation, which connects the noun synset with the synset of an adjective representing that noun’s specific attribute. We elaborate on adding this new relation in the case of the Serbian WordNet (SWN). The proposed method is evaluated by human judgement in order to determine the relevance of automatically selected relation items. The evaluation has shown that 84% of the automatically selected and the most frequent linguistic constructs, whose frequency threshold was equal to 3, were also selected by humans.

2014

pdf bib
Developing and Maintaining a WordNet: Procedures and Tools
Miljana Mladenović | Jelena Mitrović | Cvetana Krstev
Proceedings of the Seventh Global Wordnet Conference