Tighter Clusters, Safer Code? Improving Vulnerability Detection with Enhanced Contrastive Loss

Pranav Kapparad; Biju R Mohan

doi:10.18653/v1/2025.naacl-srw.24

Tighter Clusters, Safer Code? Improving Vulnerability Detection with Enhanced Contrastive Loss

Abstract

Distinguishing vulnerable code from non-vulnerable code is challenging due to high inter-class similarity. Supervised contrastive learning (SCL) improves embedding separation but struggles with intra-class clustering, especially when variations within the same class are subtle. We propose Cluster-Enhanced Supervised Contrastive Loss (CESCL), an extension of SCL with a distance-based regularization term that tightens intra-class clustering while maintaining inter-class separation. Evaluating on CodeBERT and GraphCodeBERT with Binary Cross Entropy (BCE), BCE + SCL, and BCE + CESCL, our method improves F1 score by 1.76% on CodeBERT and 4.1% on GraphCodeBERT, demonstrating its effectiveness in code vulnerability detection and broader applicability to high-similarity classification tasks.

Anthology ID:: 2025.naacl-srw.24
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: April
Year:: 2025
Address:: Albuquerque, USA
Editors:: Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 247–252
Language:
URL:: https://aclanthology.org/2025.naacl-srw.24/
DOI:: 10.18653/v1/2025.naacl-srw.24
Bibkey:
Cite (ACL):: Pranav Kapparad and Biju R Mohan. 2025. Tighter Clusters, Safer Code? Improving Vulnerability Detection with Enhanced Contrastive Loss. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 247–252, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: Tighter Clusters, Safer Code? Improving Vulnerability Detection with Enhanced Contrastive Loss (Kapparad & Mohan, NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-srw.24.pdf

PDF Cite Search Fix data