MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities

Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han


Abstract
Text classification is essential for organizing unstructured text. Traditional methods rely on human annotations or, more recently, a set of class seed words for supervision, which can be costly, particularly for specialized or emerging domains. To address this, using class surface names alone as extremely weak supervision has been proposed. However, existing approaches treat different levels of text granularity (documents, sentences, or words) independently, disregarding inter-granularity class disagreements and the context identifiable exclusively through joint extraction. In order to tackle these issues, we introduce MEGClass, an extremely weakly-supervised text classification method that leverages Mutually-Enhancing Text Granularities. MEGClass utilizes coarse- and fine-grained context signals obtained by jointly considering a document’s most class-indicative words and sentences. This approach enables the learning of a contextualized document representation that captures the most discriminative class indicators. By preserving the heterogeneity of potential classes, MEGClass can select the most informative class-indicative documents as iterative feedback to enhance the initial word-based class representations and ultimately fine-tune a pre-trained text classifier. Extensive experiments on seven benchmark datasets demonstrate that MEGClass outperforms other weakly and extremely weakly supervised methods.
Anthology ID:
2023.findings-emnlp.708
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10543–10558
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.708
DOI:
10.18653/v1/2023.findings-emnlp.708
Bibkey:
Cite (ACL):
Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, and Jiawei Han. 2023. MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10543–10558, Singapore. Association for Computational Linguistics.
Cite (Informal):
MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities (Kargupta et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.708.pdf