Prompt Tuned Embedding Classification for Industry Sector Allocation

Valentin Buchner, Lele Cao, Jan-Christoph Kalo, Vilhelm Von Ehrenheim


Abstract
We introduce Prompt Tuned Embedding Classification (PTEC) for classifying companies within an investment firm’s proprietary industry taxonomy, supporting their thematic investment strategy. PTEC assigns companies to the sectors they primarily operate in, conceptualizing this process as a multi-label text classification task. Prompt Tuning, usually deployed as a text-to-text (T2T) classification approach, ensures low computational cost while maintaining high task performance. However, T2T classification has limitations on multi-label tasks due to the generation of non-existing labels, permutation invariance of the label sequence, and a lack of confidence scores. PTEC addresses these limitations by utilizing a classification head in place of the Large Language Models (LLMs) language head. PTEC surpasses both baselines and human performance while lowering computational demands. This indicates the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of LLMs with strong generalization abilities.
Anthology ID:
2024.naacl-industry.10
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yi Yang, Aida Davani, Avi Sil, Anoop Kumar
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–118
Language:
URL:
https://aclanthology.org/2024.naacl-industry.10
DOI:
10.18653/v1/2024.naacl-industry.10
Bibkey:
Cite (ACL):
Valentin Buchner, Lele Cao, Jan-Christoph Kalo, and Vilhelm Von Ehrenheim. 2024. Prompt Tuned Embedding Classification for Industry Sector Allocation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), pages 108–118, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Prompt Tuned Embedding Classification for Industry Sector Allocation (Buchner et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-industry.10.pdf