Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification

Alexander Spangher; Jonathan May; Sz-Rung Shiang; Lingjia Deng

doi:10.18653/v1/2021.emnlp-main.40

Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification

Alexander Spangher, Jonathan May, Sz-Rung Shiang, Lingjia Deng

Abstract

As labeling schemas evolve over time, small differences can render datasets following older schemas unusable. This prevents researchers from building on top of previous annotation work and results in the existence, in discourse learning in particular, of many small class-imbalanced datasets. In this work, we show that a multitask learning approach can combine discourse datasets from similar and diverse domains to improve discourse classification. We show an improvement of 4.9% Micro F1-score over current state-of-the-art benchmarks on the NewsDiscourse dataset, one of the largest discourse datasets recently published, due in part to label correlations across tasks, which improve performance for underrepresented classes. We also offer an extensive review of additional techniques proposed to address resource-poor problems in NLP, and show that none of these approaches can improve classification accuracy in our setting.

Anthology ID:: 2021.emnlp-main.40
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 498–517
Language:
URL:: https://aclanthology.org/2021.emnlp-main.40/
DOI:: 10.18653/v1/2021.emnlp-main.40
Bibkey:
Cite (ACL):: Alexander Spangher, Jonathan May, Sz-Rung Shiang, and Lingjia Deng. 2021. Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 498–517, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification (Spangher et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.40.pdf
Video:: https://aclanthology.org/2021.emnlp-main.40.mp4

PDF Cite Search Video Fix data