Softmax Tree: An Accurate, Fast Classifier When the Number of Classes Is Large
Arman
Zharmagambetov
author
Magzhan
Gabidolla
author
Miguel
A
Carreira-Perpinan
author
2021-11
text
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Association for Computational Linguistics
Online and Punta Cana, Dominican Republic
conference publication
Classification problems having thousands or more classes naturally occur in NLP, for example language models or document classification. A softmax or one-vs-all classifier naturally handles many classes, but it is very slow at inference time, because every class score must be calculated to find the top class. We propose the “softmax tree”, consisting of a binary tree having sparse hyperplanes at the decision nodes (which make hard, not soft, decisions) and small softmax classifiers at the leaves. This is much faster at inference because the input instance follows a single path to a leaf (whose length is logarithmic on the number of leaves) and the softmax classifier at each leaf operates on a small subset of the classes. Although learning accurate tree-based models has proven difficult in the past, we are able to overcome this by using a variation of a recent algorithm, tree alternating optimization (TAO). Compared to a softmax and other classifiers, the resulting softmax trees are both more accurate in prediction and faster in inference, as shown in NLP problems having from one thousand to one hundred thousand classes.
zharmagambetov-etal-2021-softmax
10.18653/v1/2021.emnlp-main.838
https://aclanthology.org/2021.emnlp-main.838
2021-11
10730
10745