Accelerating Code Search with Deep Hashing and Code Classification

Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Michael Lyu


Abstract
Code search is to search reusable code snippets from source code corpus based on natural languages queries. Deep learning-based methods on code search have shown promising results. However, previous methods focus on retrieval accuracy, but lacked attention to the efficiency of the retrieval process. We propose a novel method CoSHC to accelerate code search with deep hashing and code classification, aiming to perform efficient code search without sacrificing too much accuracy. To evaluate the effectiveness of CoSHC, we apply our methodon five code search models. Extensive experimental results indicate that compared with previous code search baselines, CoSHC can save more than 90% of retrieval time meanwhile preserving at least 99% of retrieval accuracy.
Anthology ID:
2022.acl-long.181
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2534–2544
Language:
URL:
https://aclanthology.org/2022.acl-long.181
DOI:
10.18653/v1/2022.acl-long.181
Bibkey:
Cite (ACL):
Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Michael Lyu. 2022. Accelerating Code Search with Deep Hashing and Code Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2534–2544, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Accelerating Code Search with Deep Hashing and Code Classification (Gu et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.181.pdf
Data
CodeSearchNet