Danka Jokić
2020
Multi-word Expressions for Abusive Speech Detection in Serbian
Ranka Stanković
|
Jelena Mitrović
|
Danka Jokić
|
Cvetana Krstev
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
This paper presents our work on the refinement and improvement of the Serbian language part of Hurtlex, a multilingual lexicon of words to hurt. We pay special attention to adding Multi-word expressions that can be seen as abusive, as such lexical entries are very important in obtaining good results in a plethora of abusive language detection tasks. We use Serbian morphological dictionaries as a basis for data cleaning and MWE dictionary creation. A connection to other lexical and semantic resources in Serbian is outlined and building of abusive language detection systems based on that connection is foreseen.