Code Generation From Flowcharts with Texts: A Benchmark Dataset and An Approach

Zejie Liu, Xiaoyu Hu, Deyu Zhou, Lin Li, Xu Zhang, Yanzheng Xiang


Abstract
Currently, researchers focus on generating codes from the requirement documents. However, current approaches still perform poorly on some requirements needing complex problem-solving skills. In reality, to tackle such complex requirements, instead of directly translating requirement documents into codes, software engineers write codes via unified modeling language diagrams, such as flowcharts, an intermediate tool to analyze and visualize the system. Therefore, we propose a new source code generation task, that is, to generate source code from flowcharts with texts. We manually construct a benchmark dataset containing 320 flowcharts with their corresponding source codes. Obviously, it is not straightforward to employ the current approaches for the new source code generation task since (1) the flowchart is a graph that contains various structures, including loop, selection, and others which is different from texts; (2) the connections between nodes in the flowchart are abundant and diverse which need to be carefully handled. To solve the above problems, we propose a two-stage code generation model. In the first stage, a structure recognition algorithm is employed to transform the flowchart into pseudo-code containing the structural conventions of a typical programming language such as while, if. In the second stage, a code generation model is employed to convert the pseudo-code into code. Experimental results show that the proposed approach can achieve some improvement over the baselines.
Anthology ID:
2022.findings-emnlp.449
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6069–6077
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.449
DOI:
10.18653/v1/2022.findings-emnlp.449
Bibkey:
Cite (ACL):
Zejie Liu, Xiaoyu Hu, Deyu Zhou, Lin Li, Xu Zhang, and Yanzheng Xiang. 2022. Code Generation From Flowcharts with Texts: A Benchmark Dataset and An Approach. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6069–6077, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Code Generation From Flowcharts with Texts: A Benchmark Dataset and An Approach (Liu et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.449.pdf
Video:
 https://aclanthology.org/2022.findings-emnlp.449.mp4