iCompass at SemEval-2020 Task 12: From a Syntax-ignorant N-gram Embeddings Model to a Deep Bidirectional Language Model

Abir Messaoudi, Hatem Haddad, Moez Ben Haj Hmida


Abstract
We describe our submitted system to the SemEval 2020. We tackled Task 12 entitled “Multilingual Offensive Language Identification in Social Media”, specifically subtask 4A-Arabic. We propose three Arabic offensive language identification models: Tw-StAR, BERT and BERT+BiLSTM. Two Arabic abusive/hate datasets were added to the training dataset: L-HSAB and T-HSAB. The final submission was chosen based on the best performances which was achieved by the BERT+BiLSTM model.
Anthology ID:
2020.semeval-1.260
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Editors:
Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:
SemEval
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1978–1982
Language:
URL:
https://aclanthology.org/2020.semeval-1.260
DOI:
10.18653/v1/2020.semeval-1.260
Bibkey:
Cite (ACL):
Abir Messaoudi, Hatem Haddad, and Moez Ben Haj Hmida. 2020. iCompass at SemEval-2020 Task 12: From a Syntax-ignorant N-gram Embeddings Model to a Deep Bidirectional Language Model. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1978–1982, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
iCompass at SemEval-2020 Task 12: From a Syntax-ignorant N-gram Embeddings Model to a Deep Bidirectional Language Model (Messaoudi et al., SemEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.semeval-1.260.pdf