Theoretical and Empirical Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Open-World Scenarios

Paulo Cavalin, Claudio Santos Pinhanez


Abstract
This work explores the intrinsic limitations of the popular one-hot encoding method in classification of intents when detection of out-of-scope (OOS) inputs is required. Although recent work has shown that there can be significant improvements in OOS detection when the intent classes are represented as dense-vectors based on domain-specific knowledge, we argue in this paper that such gains are more likely due to advantages of the much richer topologies that can be created with dense vectors compared to the equidistant class representation assumed by one-hot encodings. We start by demonstrating how dense-vector encodings are able to create OOS spaces with much richer topologies. Then, we show empirically, using four standard intent classification datasets, that knowledge-free, randomly generated dense-vector encodings of intent classes can yield over 20% gains over one-hot encodings, producing better systems for open-world classification tasks, mostly from improvements in OOS detection.
Anthology ID:
2024.lrec-main.1391
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16000–16013
Language:
URL:
https://aclanthology.org/2024.lrec-main.1391
DOI:
Bibkey:
Cite (ACL):
Paulo Cavalin and Claudio Santos Pinhanez. 2024. Theoretical and Empirical Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Open-World Scenarios. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16000–16013, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Theoretical and Empirical Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Open-World Scenarios (Cavalin & Pinhanez, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1391.pdf