Advancing Topical Text Classification: A Novel Distance-Based Method with Contextual Embeddings

Andriy Kosar, Guy De Pauw, Walter Daelemans


Abstract
This study introduces a new method for distance-based unsupervised topical text classification using contextual embeddings. The method applies and tailors sentence embeddings for distance-based topical text classification. This is achieved by leveraging the semantic similarity between topic labels and text content, and reinforcing the relationship between them in a shared semantic space. The proposed method outperforms a wide range of existing sentence embeddings on average by 35%. Presenting an alternative to the commonly used transformer-based zero-shot general-purpose classifiers for multiclass text classification, the method demonstrates significant advantages in terms of computational efficiency and flexibility, while maintaining comparable or improved classification results.
Anthology ID:
2023.ranlp-1.64
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
586–597
Language:
URL:
https://aclanthology.org/2023.ranlp-1.64
DOI:
Bibkey:
Cite (ACL):
Andriy Kosar, Guy De Pauw, and Walter Daelemans. 2023. Advancing Topical Text Classification: A Novel Distance-Based Method with Contextual Embeddings. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 586–597, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Advancing Topical Text Classification: A Novel Distance-Based Method with Contextual Embeddings (Kosar et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.64.pdf