Hao Chen

UC Davis

Other people with similar names: Hao Chen (Tsinghua), Hao Chen (Chinese Academy of Sciences), Hao Chen (South China Normal University), Hao Chen (HKUST), Hao Chen (Nankai), Hao Chen (Hong Kong Polytechnic), Hao Chen, Hao Chen (Zhejiang), Hao Chen (Dalian, Alibaba)

Unverified author pages with similar names: Hao Chen

2025

pdf bib abs

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation
Hongxiang Zhang | Hao Chen | Muhao Chen | Tianyi Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent decoding methods improve the factuality of large language models (LLMs) by refining how the next token is selected during generation. These methods typically operate at the token level, leveraging internal representations to suppress superficial patterns. Nevertheless, LLMs remain prone to hallucinations, especially over longer contexts. In this paper, we propose Active Layer-Contrastive Decoding (ActLCD), a novel decoding strategy that actively decides when to apply contrasting layers during generation. By casting decoding as a sequential decision-making problem, ActLCD employs a reinforcement learning policy guided by a reward-aware classifier to optimize factuality beyond the token level. Our experiments demonstrate that ActLCD surpasses state-of-the-art methods across five benchmarks, showcasing its effectiveness in mitigating hallucinations in diverse generation scenarios.

pdf bib abs

FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation
Yifeng He | Jicheng Wang | Yuyang Rong | Hao Chen
Findings of the Association for Computational Linguistics: EMNLP 2025

Testing is essential to modern software engineering for building reliable software.Given the high costs of manually creating test cases,automated test case generation, particularly methods utilizing large language models,has become increasingly popular.These neural approaches generate semantically meaningful tests that are more maintainable compared with traditional automated testing methods such as fuzzing.However, the diversity and volume of unit tests in current datasets are limited, especially for newer but important languages.In this paper, we present a novel data augmentation technique, *FuzzAug*,that brings the benefits of fuzzing to large language models by incorporating valid testing semantics and providing diverse coverage-guided inputs.Doubling the size of training datasets,FuzzAug improves performance over the baselines significantly.This technique demonstrates the potential of introducing prior knowledge from dynamic software analysisto improve neural test generation,offering significant enhancements in this task.Our code is open-sourced at https://github.com/SecurityLab-UCD/FuzzAug.

2023

pdf bib abs

Understanding Programs by Exploiting (Fuzzing) Test Cases
Jianyu Zhao | Yuyang Rong | Yiwen Guo | Yifeng He | Hao Chen
Findings of the Association for Computational Linguistics: ACL 2023

Semantic understanding of programs has attracted great attention in the community. Inspired by recent successes of large language models (LLMs) in natural language understanding, tremendous progress has been made by treating programming language as another sort of natural language and training LLMs on corpora of program code. However, programs are essentially different from texts after all, in a sense that they are normally heavily structured and syntax-strict. In particular, programs and their basic units (i.e., functions and subroutines) are designed to demonstrate a variety of behaviors and/or provide possible outputs, given different inputs. The relationship between inputs and possible outputs/behaviors represents the functions/subroutines and profiles the program as a whole. Hence, we propose to incorporate such a relationship into learning, for achieving a deeper semantic understanding of programs. To obtain inputs that are representative enough to trigger the execution of most part of the code, we resort to fuzz testing and propose fuzz tuning to boost the performance of program understanding and code representation learning, given a pre-trained LLM. The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins. Code is available at https://github.com/rabbitjy/FuzzTuning.

Co-authors

Hongxiang Zhang 1

Tianyi Zhang 1

Jianyu Zhao 1

Venues

Findings2
EMNLP1

Fix author