Yinxu Pan
2024
DebugBench: Evaluating Debugging Capability of Large Language Models
Runchu Tian
|
Yining Ye
|
Yujia Qin
|
Xin Cong
|
Yankai Lin
|
Yinxu Pan
|
Yesai Wu
|
Hui Haotian
|
Liu Weichuan
|
Zhiyuan Liu
|
Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2024
Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs’ debugging ability are significantly limited by the risk of data leakage, the scale of the dataset, and the variety of tested bugs. To overcome these deficiencies, we introduce ‘DebugBench’, an LLM debugging benchmark consisting of 4,253 instances. It covers four major bug categories and 18 minor types in C++, Java, and Python. To construct DebugBench, we collect code snippets from the LeetCode community, implant bugs into source data with GPT-4, and assure rigorous quality checks. We evaluate two commercial and four open-source models in a zero-shot scenario. We find that (1) while closed-source models exhibit inferior debugging performance compared to humans, open-source models relatively lower pass rate scores; (2) the complexity of debugging notably fluctuates depending on the bug category; (3) incorporating runtime feedback has a clear impact on debugging performance which is not always helpful. As an extension, we also compare LLM debugging and code generation, revealing a strong correlation between them for closed-source models. These findings will benefit the development of LLMs in debugging.
2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Qiming Peng
|
Yinxu Pan
|
Wenjin Wang
|
Bin Luo
|
Zhenyu Zhang
|
Zhengjie Huang
|
Yuhui Cao
|
Weichong Yin
|
Yongfeng Chen
|
Yin Zhang
|
Shikun Feng
|
Yu Sun
|
Hao Tian
|
Hua Wu
|
Haifeng Wang
Findings of the Association for Computational Linguistics: EMNLP 2022
Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding. However, most existing methods lack the systematic mining and utilization of layout-centered knowledge, leading to sub-optimal performances. In this paper, we propose ERNIE-Layout, a novel document pre-training solution with layout knowledge enhancement in the whole workflow, to learn better representations that combine the features from text, layout, and image. Specifically, we first rearrange input sequences in the serialization stage, and then present a correlative pre-training task, reading order prediction, to learn the proper reading order of documents. To improve the layout awareness of the model, we integrate a spatial-aware disentangled attention into the multi-modal transformer and a replaced regions prediction task into the pre-training phase. Experimental results show that ERNIE-Layout achieves superior performance on various downstream tasks, setting new state-of-the-art on key information extraction, document image classification, and document question answering datasets. The code and models are publicly available at PaddleNLP.
Search
Fix data
Co-authors
- Yuhui Cao 1
- Yongfeng Chen 1
- Xin Cong 1
- Shikun Feng 1
- Hui Haotian 1
- show all...