Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup

Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea


Abstract
Distinguishing informative and actionable messages from a social media platform like Twitter is critical for facilitating disaster management. For this purpose, we compile a multilingual dataset of over 130K samples for multi-label classification of disaster-related tweets. We present a masking-based loss function for partially labelled samples and demonstrate the effectiveness of Manifold Mixup in the text domain. Our main model is based on Multilingual BERT, which we further improve with Manifold Mixup. We show that our model generalizes to unseen disasters in the test set. Furthermore, we analyze the capability of our model for zero-shot generalization to new languages. Our code, dataset, and other resources are available on Github.
Anthology ID:
2020.acl-srw.39
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
July
Year:
2020
Address:
Online
Editors:
Shruti Rijhwani, Jiangming Liu, Yizhong Wang, Rotem Dror
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
292–298
Language:
URL:
https://aclanthology.org/2020.acl-srw.39
DOI:
10.18653/v1/2020.acl-srw.39
Bibkey:
Cite (ACL):
Jishnu Ray Chowdhury, Cornelia Caragea, and Doina Caragea. 2020. Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 292–298, Online. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup (Ray Chowdhury et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-srw.39.pdf
Video:
 http://slideslive.com/38928661
Code
 JRC1995/Multilingual-BERT-Disaster