Shiping Gao

2025

pdf bib abs
Edit-Wise Preference Optimization for Grammatical Error Correction
Jiehao Liang | Haihui Yang | Shiping Gao | Xiaojun Quan
Proceedings of the 31st International Conference on Computational Linguistics

While large language models (LLMs) have achieved remarkable success in various natural language processing tasks, their strengths have yet to be fully demonstrated in grammatical error correction (GEC). This is partly due to the misalignment between their pre-training objectives and the GEC principle of making minimal edits. In this work, we aim to bridge this gap by introducing a novel method called Edit-wise Preference Optimization (EPO). By distinguishing the importance of different tokens and assigning higher reward weights to edit tokens during preference optimization, our method captures fine-grained distinctions in GEC that traditional preference learning often overlooks. Extensive experiments on both English and Chinese datasets show that our framework consistently outperforms strong baselines, achieving state-of-the-art performance and demonstrating the advantages of LLMs in GEC.

2024

pdf bib abs
Self-Evolution Fine-Tuning for Policy Optimization
Ruijun Chen | Jiehao Liang | Shiping Gao | Fanqi Wan | Xiaojun Quan
Findings of the Association for Computational Linguistics: EMNLP 2024

The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. To address the challenges of current alignment methodologies, we introduce self-evolution fine-tuning (SEFT) for LLM alignment, aiming to eliminate the need for annotated samples while retaining the stability and efficiency of SFT. SEFT first trains an adaptive reviser to elevate low-quality responses while maintaining high-quality ones. The reviser then gradually guides the policy’s optimization by fine-tuning it with enhanced responses. The method excels in utilizing unlimited unannotated data to optimize policies via supervised fine-tuning. Our experiments on AlpacaEval and MT-Bench demonstrate the effectiveness of SEFT and its advantages over existing alignment techniques.

Co-authors

Venues

coling1
findings1

Fix data