Aggressive Language Identification Using Word Embeddings and Sentiment Features

Constantin Orăsan


Abstract
This paper describes our participation in the First Shared Task on Aggression Identification. The method proposed relies on machine learning to identify social media texts which contain aggression. The main features employed by our method are information extracted from word embeddings and the output of a sentiment analyser. Several machine learning methods and different combinations of features were tried. The official submissions used Support Vector Machines and Random Forests. The official evaluation showed that for texts similar to the ones in the training dataset Random Forests work best, whilst for texts which are different SVMs are a better choice. The evaluation also showed that despite its simplicity the method performs well when compared with more elaborated methods.
Anthology ID:
W18-4414
Volume:
Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Ritesh Kumar, Atul Kr. Ojha, Marcos Zampieri, Shervin Malmasi
Venue:
TRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–119
Language:
URL:
https://aclanthology.org/W18-4414
DOI:
Bibkey:
Cite (ACL):
Constantin Orăsan. 2018. Aggressive Language Identification Using Word Embeddings and Sentiment Features. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pages 113–119, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Aggressive Language Identification Using Word Embeddings and Sentiment Features (Orăsan, TRAC 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4414.pdf
Code
 dinel/aggression_identification