Python Code Generation by Asking Clarification Questions

Haau-Sing (Xiaocheng) Li, Mohsen Mesgar, André Martins, Iryna Gurevych


Abstract
Code generation from text requires understanding the user’s intent from a natural languagedescription and generating an executable code snippet that satisfies this intent. While recent pretrained language models demonstrate remarkable performance for this task, these models fail when the given natural language description is under-specified. In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. Therefore, we collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers. The empirical results of our evaluation of pretrained language model performance on code generation show that clarifications result in more precisely generated code, as shown by the substantial improvement of model performance in all evaluation metrics. Alongside this, our task and dataset introduce new challenges to the community, including when and what clarification questions should be asked. Our code and dataset are available on GitHub.
Anthology ID:
2023.acl-long.799
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14287–14306
Language:
URL:
https://aclanthology.org/2023.acl-long.799
DOI:
10.18653/v1/2023.acl-long.799
Bibkey:
Cite (ACL):
Haau-Sing (Xiaocheng) Li, Mohsen Mesgar, André Martins, and Iryna Gurevych. 2023. Python Code Generation by Asking Clarification Questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14287–14306, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Python Code Generation by Asking Clarification Questions (Li et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.799.pdf
Video:
 https://aclanthology.org/2023.acl-long.799.mp4