Extracting a Knowledge Base of COVID-19 Events from Social Media

Shi Zong; Ashutosh Baheti; Wei Xu; Alan Ritter

Extracting a Knowledge Base of COVID-19 Events from Social Media

Shi Zong, Ashutosh Baheti, Wei Xu, Alan Ritter

Abstract

We present a manually annotated corpus of 10,000 tweets containing public reports of five COVID-19 events, including positive and negative tests, deaths, denied access to testing, claimed cures and preventions. We designed slot-filling questions for each event type and annotated a total of 28 fine-grained slots, such as the location of events, recent travel, and close contacts. We show that our corpus can support fine-tuning BERT-based classifiers to automatically extract publicly reported events, which can be further collected for building a knowledge base. Our knowledge base is constructed over Twitter data covering two years and currently covers over 4.2M events. It can answer complex queries with high precision, such as “Which organizations have employees that tested positive in Philadelphia?” We believe our proposed methodology could be quickly applied to develop knowledge bases for new domains in response to an emerging crisis, including natural disasters or future disease outbreaks.

Anthology ID:: 2022.coling-1.335
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 3810–3823
Language:
URL:: https://aclanthology.org/2022.coling-1.335/
DOI:
Bibkey:
Cite (ACL):: Shi Zong, Ashutosh Baheti, Wei Xu, and Alan Ritter. 2022. Extracting a Knowledge Base of COVID-19 Events from Social Media. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3810–3823, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: Extracting a Knowledge Base of COVID-19 Events from Social Media (Zong et al., COLING 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.coling-1.335.pdf

PDF Cite Search Fix data