R00 at NLP4IF-2021 Fighting COVID-19 Infodemic with Transformers and More Transformers

Ahmed Qarqaz, Dia Abujaber, Malak Abdullah


Abstract
This paper describes the winning model in the Arabic NLP4IF shared task for fighting the COVID-19 infodemic. The goal of the shared task is to check disinformation about COVID-19 in Arabic tweets. Our proposed model has been ranked 1st with an F1-Score of 0.780 and an Accuracy score of 0.762. A variety of transformer-based pre-trained language models have been experimented with through this study. The best-scored model is an ensemble of AraBERT-Base, Asafya-BERT, and ARBERT models. One of the study’s key findings is showing the effect the pre-processing can have on every model’s score. In addition to describing the winning model, the current study shows the error analysis.
Anthology ID:
2021.nlp4if-1.15
Volume:
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
Month:
June
Year:
2021
Address:
Online
Editors:
Anna Feldman, Giovanni Da San Martino, Chris Leberknight, Preslav Nakov
Venue:
NLP4IF
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
104–109
Language:
URL:
https://aclanthology.org/2021.nlp4if-1.15
DOI:
10.18653/v1/2021.nlp4if-1.15
Bibkey:
Cite (ACL):
Ahmed Qarqaz, Dia Abujaber, and Malak Abdullah. 2021. R00 at NLP4IF-2021 Fighting COVID-19 Infodemic with Transformers and More Transformers. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 104–109, Online. Association for Computational Linguistics.
Cite (Informal):
R00 at NLP4IF-2021 Fighting COVID-19 Infodemic with Transformers and More Transformers (Qarqaz et al., NLP4IF 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nlp4if-1.15.pdf
Data
ArCOV-19