Testing the Effect of Code Documentation on Large Language Model Code Understanding

William Macke, Michael Doyle


Abstract
Large Language Models (LLMs) have demonstrated impressive abilities in recent years with regards to code generation and understanding. However, little work has investigated how documentation and other code properties affect an LLM’s ability to understand and generate code or documentation. We present an empirical analysis of how underlying properties of code or documentation can affect an LLM’s capabilities. We show that providing an LLM with “incorrect” documentation can greatly hinder code understanding, while incomplete or missing documentation does not seem to significantly affect an LLM’s ability to understand code.
Anthology ID:
2024.findings-naacl.66
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1044–1050
Language:
URL:
https://aclanthology.org/2024.findings-naacl.66
DOI:
Bibkey:
Cite (ACL):
William Macke and Michael Doyle. 2024. Testing the Effect of Code Documentation on Large Language Model Code Understanding. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1044–1050, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Testing the Effect of Code Documentation on Large Language Model Code Understanding (Macke & Doyle, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.66.pdf
Copyright:
 2024.findings-naacl.66.copyright.pdf