Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data

Dheeraj Mekala; Varun Gangal; Jingbo Shang

doi:10.18653/v1/2021.emnlp-main.46

Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data

Dheeraj Mekala, Varun Gangal, Jingbo Shang

Abstract

Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases. To accommodate such requirements, we introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data. Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance and weave in rich pre-trained generative language models into the iterative weak supervision strategy. Specifically, we first propose a label-conditioned fine-tuning formulation to attune these generators for our task. Furthermore, we devise a regularization objective based on the coarse-fine label constraints derived from our problem setting, giving us even further improvements over the prior formulation. Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement. Extensive experiments and case studies on two real-world datasets demonstrate superior performance over SOTA zero-shot classification baselines.

Anthology ID:: 2021.emnlp-main.46
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 583–594
Language:
URL:: https://aclanthology.org/2021.emnlp-main.46/
DOI:: 10.18653/v1/2021.emnlp-main.46
Bibkey:
Cite (ACL):: Dheeraj Mekala, Varun Gangal, and Jingbo Shang. 2021. Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 583–594, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data (Mekala et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.46.pdf
Video:: https://aclanthology.org/2021.emnlp-main.46.mp4

PDF Cite Search Video Fix data