Xin Tao


2021

pdf bib
ZYJ123@DravidianLangTech-EACL2021: Offensive Language Identification based on XLM-RoBERTa with DPCNN
Yingjia Zhao | Xin Tao
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

The development of online media platforms has given users more opportunities to post and comment freely, but the negative impact of offensive language has become increasingly apparent. It is very necessary for the automatic identification system of offensive language. This paper describes our work on the task of Offensive Language Identification in Dravidian language-EACL 2021. To complete this task, we propose a system based on the multilingual model XLM-Roberta and DPCNN. The test results on the official test data set confirm the effectiveness of our system. The weighted average F1-score of Kannada, Malayalam, and Tami language are 0.69, 0.92, and 0.76 respectively, ranked 6th, 6th, and 3rd

pdf bib
ZYJ at SemEval-2021 Task 7: HaHackathon: Detecting and Rating Humor and Offense with ALBERT-Based Model
Yingjia Zhao | Xin Tao
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This article introduces the submission of subtask 1 and subtask 2 that we participate in SemEval-2021 Task 7: HaHackathon: Detecting and Rating Humor and Offense, we use a model based on ALBERT that uses ALBERT as the module for extracting text features. We modify the upper layer structure by adding specific networks to better summarize the semantic information. Finally, our system achieves an F-Score of 0.9348 in subtask 1a, RMSE of 0.7214 in subtask 1b, F-Score of 0.4603 in subtask 1c, and RMSE of 0.5204 in subtask 2.

pdf bib
ZYJ@LT-EDI-EACL2021:XLM-RoBERTa-Based Model with Attention for Hope Speech Detection
Yingjia Zhao | Xin Tao
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

Due to the development of modern computer technology and the increase in the number of online media users, we can see all kinds of posts and comments everywhere on the internet. Hope speech can not only inspire the creators but also make other viewers pleasant. It is necessary to effectively and automatically detect hope speech. This paper describes the approach of our team in the task of hope speech detection. We use the attention mechanism to adjust the weight of all the output layers of XLM-RoBERTa to make full use of the information extracted from each layer, and use the weighted sum of all the output layers to complete the classification task. And we use the Stratified-K-Fold method to enhance the training data set. We achieve a weighted average F1-score of 0.59, 0.84, and 0.92 for Tamil, Malayalam, and English language, ranked 3rd, 2nd, and 2nd.

2020

pdf bib
YNUtaoxin at SemEval-2020 Task 11: Identification Fragments of Propaganda Technique by Neural Sequence Labeling Models with Different Tagging Schemes and Pre-trained Language Model
Xin Tao | Xiaobing Zhou
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We only participated in the first subtask, and a neural sequence model was used to perform the sequence tagging task. We investigated the effects of different markup strategies on model performance. Bert that performed very well in NLP was used as a feature extractor.