Danka Jokić


2020

pdf bib
Multi-word Expressions for Abusive Speech Detection in Serbian
Ranka Stanković | Jelena Mitrović | Danka Jokić | Cvetana Krstev
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

This paper presents our work on the refinement and improvement of the Serbian language part of Hurtlex, a multilingual lexicon of words to hurt. We pay special attention to adding Multi-word expressions that can be seen as abusive, as such lexical entries are very important in obtaining good results in a plethora of abusive language detection tasks. We use Serbian morphological dictionaries as a basis for data cleaning and MWE dictionary creation. A connection to other lexical and semantic resources in Serbian is outlined and building of abusive language detection systems based on that connection is foreseen.