Qihui Lin


2021

pdf bib
HITSZ-HLT at SemEval-2021 Task 5: Ensemble Sequence Labeling and Span Boundary Detection for Toxic Span Detection
Qinglin Zhu | Zijie Lin | Yice Zhang | Jingyi Sun | Xiang Li | Qihui Lin | Yixue Dang | Ruifeng Xu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents the winning system that participated in SemEval-2021 Task 5: Toxic Spans Detection. This task aims to locate those spans that attribute to the text’s toxicity within a text, which is crucial for semi-automated moderation in online discussions. We formalize this task as the Sequence Labeling (SL) problem and the Span Boundary Detection (SBD) problem separately and employ three state-of-the-art models. Next, we integrate predictions of these models to produce a more credible and complement result. Our system achieves a char-level score of 70.83%, ranking 1/91. In addition, we also explore the lexicon-based method, which is strongly interpretable and flexible in practice.