The Strength of the Weakest Supervision: Topic Classification Using Class Labels

Jiatong Li, Kai Zheng, Hua Xu, Qiaozhu Mei, Yue Wang


Abstract
When developing topic classifiers for real-world applications, we begin by defining a set of meaningful topic labels. Ideally, an intelligent classifier can understand these labels right away and start classifying documents. Indeed, a human can confidently tell if an article is about science, politics, sports, or none of the above, after knowing just the class labels. We study the problem of training an initial topic classifier using only class labels. We investigate existing techniques for solving this problem and propose a simple but effective approach. Experiments on a variety of topic classification data sets show that learning from class labels can save significant initial labeling effort, essentially providing a ”free” warm start to the topic classifier.
Anthology ID:
N19-3004
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Sudipta Kar, Farah Nadeem, Laura Burdick, Greg Durrett, Na-Rae Han
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–28
Language:
URL:
https://aclanthology.org/N19-3004
DOI:
10.18653/v1/N19-3004
Bibkey:
Cite (ACL):
Jiatong Li, Kai Zheng, Hua Xu, Qiaozhu Mei, and Yue Wang. 2019. The Strength of the Weakest Supervision: Topic Classification Using Class Labels. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 22–28, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
The Strength of the Weakest Supervision: Topic Classification Using Class Labels (Li et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-3004.pdf
Presentation:
 N19-3004.Presentation.pptx
Video:
 https://aclanthology.org/N19-3004.mp4