OCHADAI at SMM4H-2021 Task 5: Classifying self-reporting tweets on potential cases of COVID-19 by ensembling pre-trained language models

Ying Luo; Lis Pereira; Kobayashi Ichiro

doi:10.18653/v1/2021.smm4h-1.25

OCHADAI at SMM4H-2021 Task 5: Classifying self-reporting tweets on potential cases of COVID-19 by ensembling pre-trained language models

Abstract

Since the outbreak of coronavirus at the end of 2019, there have been numerous studies on coro- navirus in the NLP arena. Meanwhile, Twitter has been a valuable source of news and a pub- lic medium for the conveyance of information and personal expression. This paper describes the system developed by the Ochadai team for the Social Media Mining for Health Appli- cations (SMM4H) 2021 Task 5, which aims to automatically distinguish English tweets that self-report potential cases of COVID-19 from those that do not. We proposed a model ensemble that leverages pre-trained represen- tations from COVID-Twitter-BERT (Müller et al., 2020), RoBERTa (Liu et al., 2019), and Twitter-RoBERTa (Glazkova et al., 2021). Our model obtained F1-scores of 76% on the test set in the evaluation phase, and 77.5% in the post-evaluation phase.

Anthology ID:: 2021.smm4h-1.25
Volume:: Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Month:: June
Year:: 2021
Address:: Mexico City, Mexico
Editors:: Arjun Magge, Ari Klein, Antonio Miranda-Escalada, Mohammed Ali Al-garadi, Ilseyar Alimova, Zulfat Miftahutdinov, Eulalia Farre-Maduell, Salvador Lima Lopez, Ivan Flores, Karen O'Connor, Davy Weissenbacher, Elena Tutubalina, Abeed Sarker, Juan M Banda, Martin Krallinger, Graciela Gonzalez-Hernandez
Venue:: SMM4H
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 123–125
Language:
URL:: https://aclanthology.org/2021.smm4h-1.25/
DOI:: 10.18653/v1/2021.smm4h-1.25
Bibkey:
Cite (ACL):: Ying Luo, Lis Pereira, and Kobayashi Ichiro. 2021. OCHADAI at SMM4H-2021 Task 5: Classifying self-reporting tweets on potential cases of COVID-19 by ensembling pre-trained language models. In Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pages 123–125, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: OCHADAI at SMM4H-2021 Task 5: Classifying self-reporting tweets on potential cases of COVID-19 by ensembling pre-trained language models (Luo et al., SMM4H 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.smm4h-1.25.pdf

PDF Cite Search Fix data