HITMI&T at SemEval-2021 Task 5: Integrating Transformer and CRF for Toxic Spans Detection

Chenyi Wang, Tianshu Liu, Tiejun Zhao


Abstract
This paper introduces our system at SemEval-2021 Task 5: Toxic Spans Detection. The task aims to accurately locate toxic spans within a text. Using BIO tagging scheme, we model the task as a token-level sequence labeling task. Our system uses a single model built on the model of multi-layer bidirectional transformer encoder. And we introduce conditional random field (CRF) to make the model learn the constraints between tags. We use ERNIE as pre-trained model, which is more suitable for the task accroding to our experiments. In addition, we use adversarial training with the fast gradient method (FGM) to improve the robustness of the system. Our system obtains 69.85% F1 score, ranking 3rd for the official evaluation.
Anthology ID:
2021.semeval-1.117
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP | SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
870–874
Language:
URL:
https://aclanthology.org/2021.semeval-1.117
DOI:
10.18653/v1/2021.semeval-1.117
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.semeval-1.117.pdf