SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection

Flor Miriam Plaza-del-Arco; Pilar López-Úbeda; L. Alfonso Urena Lopez; M. Teresa Martín-Valdivia

doi:10.18653/v1/2021.semeval-1.134

SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection

Flor Miriam Plaza-del-Arco, Pilar López-Úbeda, L. Alfonso Ureña-López, M. Teresa Martín-Valdivia

Abstract

This paper describes the participation of SINAI team at Task 5: Toxic Spans Detection which consists of identifying spans that make a text toxic. Although several resources and systems have been developed so far in the context of offensive language, both annotation and tasks have mainly focused on classifying whether a text is offensive or not. However, detecting toxic spans is crucial to identify why a text is toxic and can assist human moderators to locate this type of content on social media. In order to accomplish the task, we follow a deep learning-based approach using a Bidirectional variant of a Long Short Term Memory network along with a stacked Conditional Random Field decoding layer (BiLSTM-CRF). Specifically, we test the performance of the combination of different pre-trained word embeddings for recognizing toxic entities in text. The results show that the combination of word embeddings helps in detecting offensive content. Our team ranks 29th out of 91 participants.

Anthology ID:: 2021.semeval-1.134
Volume:: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:: August
Year:: 2021
Address:: Online
Editors:: Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurelie Herbelot, Xiaodan Zhu
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 984–989
Language:
URL:: https://aclanthology.org/2021.semeval-1.134/
DOI:: 10.18653/v1/2021.semeval-1.134
Bibkey:
Cite (ACL):: Flor Miriam Plaza-del-Arco, Pilar López-Úbeda, L. Alfonso Ureña-López, and M. Teresa Martín-Valdivia. 2021. SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 984–989, Online. Association for Computational Linguistics.
Cite (Informal):: SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection (Plaza-del-Arco et al., SemEval 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.semeval-1.134.pdf

PDF Cite Search Fix data