Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance

Yewei Song, Cedric Lothritz, Xunzhu Tang, Tegawendé Bissyandé, Jacques Klein


Abstract
This paper revisits recent code similarity evaluation metrics, particularly focusing on the application of Abstract Syntax Tree (AST) editing distance in diverse programming languages. In particular, we explore the usefulness of these metrics and compare them to traditional sequence similarity metrics. Our experiments showcase the effectiveness of AST editing distance in capturing intricate code structures, revealing a high correlation with established metrics. Furthermore, we explore the strengths and weaknesses of AST editing distance and prompt-based GPT similarity scores in comparison to BLEU score, execution match, and Jaccard Similarity. We propose, optimize, and publish an adaptable metric that demonstrates effectiveness across all tested languages, representing an enhanced version of Tree Similarity of Edit Distance (TSED).
Anthology ID:
2024.acl-short.3
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–46
Language:
URL:
https://aclanthology.org/2024.acl-short.3
DOI:
Bibkey:
Cite (ACL):
Yewei Song, Cedric Lothritz, Xunzhu Tang, Tegawendé Bissyandé, and Jacques Klein. 2024. Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 38–46, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance (Song et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-short.3.pdf