SpanCS:面向跨语言代码生成的片段级语码转换(SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation)

Zhu Qingfu (朱庆福), Zhou Shiqi (周士祺), Wang Shuo (王硕), Zhang Zhiming (张致铭), Wang Haoyu (王昊钰), Chen Qiguang (陈麒光), Che Wanxiang (车万翔)


Abstract
“跨语言代码生成旨在将英语到代码的生成能力迁移至其他自然语言。翻译-训 练(Translate-Train)和语码转换(Code-Switching)是实现跨语言迁移的两类经典数据增广方法,两者优势互补但尚未有效结合。为此,本文提出了一种面向跨语言代码生成的片段级语码转换(SpanCS)方法。首先,该方法利用语码转换框架关联源语言上下文与目标语言片段,以促进多种语言的交互和对齐。其次,该方法利用翻译-训练方法从完整的源语言翻译中提取目标语言片段,以保证增广数据与原始数据间的语义一致性。为了公平地评价多种自然语言之间代码生成的性能差异,本文通过人工翻译与校验,基于HumanEval构建了包含10种自然语言的多语言代码生成评测基MHumanEval。该基准上的三个主干模型的实验结果表明,SpanCS在跨语言代码生成任务上一致优于前人的数据增广方法。”
Anthology ID:
2024.ccl-1.6
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
71–83
Language:
Chinese
URL:
https://aclanthology.org/2024.ccl-1.6/
DOI:
Bibkey:
Cite (ACL):
Zhu Qingfu, Zhou Shiqi, Wang Shuo, Zhang Zhiming, Wang Haoyu, Chen Qiguang, and Che Wanxiang. 2024. SpanCS:面向跨语言代码生成的片段级语码转换(SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 71–83, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
SpanCS:面向跨语言代码生成的片段级语码转换(SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation) (Qingfu et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.6.pdf