Less is More: Summary of Long Instructions is Better for Program Synthesis

Kirby Kuznia, Swaroop Mishra, Mihir Parmar, Chitta Baral


Abstract
Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, and names (which are included to help humans in understanding a task) does not help models in understanding a task. To this extent, we create a meta-dataset from the frequently used APPS dataset and the newly created CodeContests dataset for the program synthesis task. Our meta-dataset consists of human and synthesized summaries of the long and complicated programming questions. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on the APPS dataset and 11.88% on the CodeContests dataset on an average in terms of strict accuracy. Our analysis shows that summaries significantly improve performance for introductory (9.86%) and interview (11.48%) related programming questions. However, it shows improvement by a small margin ( 2%) for competitive programming questions, implying the scope for future research direction.
Anthology ID:
2022.emnlp-main.301
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4532–4552
Language:
URL:
https://aclanthology.org/2022.emnlp-main.301
DOI:
10.18653/v1/2022.emnlp-main.301
Bibkey:
Cite (ACL):
Kirby Kuznia, Swaroop Mishra, Mihir Parmar, and Chitta Baral. 2022. Less is More: Summary of Long Instructions is Better for Program Synthesis. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4532–4552, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Less is More: Summary of Long Instructions is Better for Program Synthesis (Kuznia et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.301.pdf