Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID Twitter BERT and Bagging Ensemble Technique based on Plurality Voting

Anshul Wadhawan


Abstract
This paper presents the approach that we employed to tackle the EMNLP WNUT-2020 Shared Task 2 : Identification of informative COVID-19 English Tweets. The task is to develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not. We solve the task in three stages. The first stage involves pre-processing the dataset by filtering only relevant information. This is followed by experimenting with multiple deep learning models like CNNs, RNNs and Transformer based models. In the last stage, we propose an ensemble of the best model trained on different subsets of the provided dataset. Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
Anthology ID:
2020.wnut-1.47
Volume:
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:
November
Year:
2020
Address:
Online
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
347–351
Language:
URL:
https://aclanthology.org/2020.wnut-1.47
DOI:
10.18653/v1/2020.wnut-1.47
Bibkey:
Cite (ACL):
Anshul Wadhawan. 2020. Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID Twitter BERT and Bagging Ensemble Technique based on Plurality Voting. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 347–351, Online. Association for Computational Linguistics.
Cite (Informal):
Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID Twitter BERT and Bagging Ensemble Technique based on Plurality Voting (Wadhawan, WNUT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wnut-1.47.pdf
Code
 anshulwadhawan/BERT_for_sequence_classification_COVID
Data
WNUT-2020 Task 2