Conical Classification For Efficient One-Class Topic Determination

Sameer Khanna


Abstract
As the Internet grows in size, so does the amount of text based information that exists. For many application spaces it is paramount to isolate and identify texts that relate to a particular topic. While one-class classification would be ideal for such analysis, there is a relative lack of research regarding efficient approaches with high predictive power. By noting that the range of documents we wish to identify can be represented as positive linear combinations of the Vector Space Model representing our text, we propose Conical classification, an approach that allows us to identify if a document is of a particular topic in a computationally efficient manner. We also propose Normal Exclusion, a modified version of Bi-Normal Separation that makes it more suitable within the one-class classification context. We show in our analysis that our approach not only has higher predictive power on our datasets, but is also faster to compute.
Anthology ID:
2021.findings-emnlp.143
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1662–1673
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.143
DOI:
10.18653/v1/2021.findings-emnlp.143
Bibkey:
Cite (ACL):
Sameer Khanna. 2021. Conical Classification For Efficient One-Class Topic Determination. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1662–1673, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Conical Classification For Efficient One-Class Topic Determination (Khanna, Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.143.pdf
Software:
 2021.findings-emnlp.143.Software.zip
Video:
 https://aclanthology.org/2021.findings-emnlp.143.mp4