Improving Repository-level Code Search with Text Conversion

Mizuki Kondo, Daisuke Kawahara, Toshiyuki Kurabayashi


Abstract
The ability to generate code using large language models (LLMs) has been increasing year by year. However, studies on code generation at the repository level are not very active. In repository-level code generation, it is necessary to refer to related code snippets among multiple files. By taking the similarity between code snippets, related files are searched and input into an LLM, and generation is performed. This paper proposes a method to search for related files (code search) by taking similarities not between code snippets but between the texts converted from the code snippets by the LLM. We confirmed that converting to text improves the accuracy of code search.
Anthology ID:
2024.naacl-srw.15
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
130–137
Language:
URL:
https://aclanthology.org/2024.naacl-srw.15
DOI:
Bibkey:
Cite (ACL):
Mizuki Kondo, Daisuke Kawahara, and Toshiyuki Kurabayashi. 2024. Improving Repository-level Code Search with Text Conversion. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 130–137, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Improving Repository-level Code Search with Text Conversion (Kondo et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-srw.15.pdf