Sheng-Wei Chen

2024

pdf bib abs
Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems
Sheng-Wei Chen | Chih-Jen Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

Text classification is one of the essential topics in natural language processing, and each text is often associated with multiple labels. Recently, the number of labels has become larger and larger, especially in the applications of e-commerce, so handling text-related e-commerce problems further requires a large memory space in many existing multi-label learning methods. To address the space concern, utilizing a distributed system to share that large memory requirement is a possible solution. We propose “random label forests,” a distributed ensemble method with label subsampling, for handling extremely large-scale labels. Random label forests can reduce memory usage per computer while keeping competitive performances over real-world data sets.

Co-authors

Chih-Jen Lin 1

Venues

findings1

Fix data