Wang Chen

2025

Direct Preference Optimization (DPO) often struggles with long-chain mathematical reasoning. Existing approaches, such as Step-DPO, typically improve this by focusing on the first erroneous step in the reasoning chain. However, they overlook all other steps and rely heavily on humans or GPT-4 to identify erroneous steps. To address these issues, we propose Full-Step-DPO, a novel DPO framework tailored for mathematical reasoning. Instead of optimizing only the first erroneous step, it leverages step-wise rewards from the entire reasoning chain. This is achieved by training a self-supervised process reward model, which automatically scores each step, providing rewards while avoiding reliance on external signals. Furthermore, we introduce a novel step-wise DPO loss, which dynamically updates gradients based on these step-wise rewards. This endows stronger reasoning capabilities to language models. Extensive evaluations on both in-domain and out-of-domain mathematical reasoning benchmarks across various base language models, demonstrate that Full-Step-DPO achieves superior performance compared to state-of-the-art baselines.

Process Reward Models (PRMs) have demonstrated promising results in mathematical reasoning, but existing process annotation approaches, whether through human annotations or Monte Carlo simulations, remain computationally expensive. In this paper, we introduce Step COmpression for Process Estimation (SCOPE), a novel compression-based approach that significantly reduces annotation costs. We first translate natural language reasoning steps into code and normalize them through Abstract Syntax Tree, then merge equivalent steps to construct a prefix tree. Unlike simulation-based methods that waste numerous samples on estimation, SCOPE leverages a compression-based prefix tree where each root-to-leaf path serves as a training sample, reducing the complexity from O(NMK) to O(N) We construct a large-scale dataset containing 509K samples with only 5% of the computational resources required by previous methods. Empirical results demonstrate that PRMs trained on our dataset consistently outperform existing automated annotation approaches on both Best-of-N strategy and ProcessBench.

2024

pdf bib abs
Don’t Forget Your Reward Values: Language Model Alignment via Value-based Calibration
Xin Mao | Feng-Lin Li | Huimin Xu | Wei Zhang | Wang Chen | Anh Tuan Luu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

While Reinforcement Learning from Human Feedback (RLHF) significantly enhances the generation quality of Large Language Models (LLMs), recent studies have raised concerns regarding the complexity and instability associated with the Proximal Policy Optimization (PPO) algorithm, proposing a series of order-based alignment methods as viable alternatives. This paper delves into existing order-based methods, unifying them into one framework and examining their inefficiencies in utilizing reward values. Building upon these findings, we propose a new Value-based Calibration (VCB) method to better align LLMs with human preferences. Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets, providing impressive generalizability, robustness, and diversity in different settings.

2023

pdf bib abs
Enhancing Ontology Knowledge for Domain-Specific Joint Entity and Relation Extraction
Xiong Xiong | Wang Chen | Liu Yunfei | Li Shengyang
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Pre-trained language models (PLMs) have been widely used in entity and relation extractionmethods in recent years. However, due to the semantic gap between general-domain text usedfor pre-training and domain-specific text, these methods encounter semantic redundancy anddomain semantics insufficiency when it comes to domain-specific tasks. To mitigate this issue,we propose a low-cost and effective knowledge-enhanced method to facilitate domain-specificsemantics modeling in joint entity and relation extraction. Precisely, we use ontology and entitytype descriptions as domain knowledge sources, which are encoded and incorporated into thedownstream entity and relation extraction model to improve its understanding of domain-specificinformation. We construct a dataset called SSUIE-RE for Chinese entity and relation extractionin space science and utilization domain of China Manned Space Engineering, which contains awealth of domain-specific knowledge. The experimental results on SSUIE-RE demonstrate theeffectiveness of our method, achieving a 1.4% absolute improvement in relation F1 score overprevious best approach. Introduction”

2022

pdf bib abs
Social-aware Sparse Attention Network for Session-based Social Recommendation
Kai Ouyang | Xianghong Xu | Chen Tang | Wang Chen | Haitao Zheng
Findings of the Association for Computational Linguistics: EMNLP 2022

Session-based Social Recommendation (SSR) aims to use users’ social networks and historical sessions to provide more personalized recommendations for the current session.Unfortunately, existing SSR methods have two limitations.First, they do not screen users’ useless social relationships and noisy irrelevant interactions.However, user preferences are mainly affected by several close friends and key interactions.Second, when modeling the current session, they do not take full advantage of user preference information.To tackle these issues, we propose a novel Social-aware Sparse Attention Network for SSR, abbreviated as SSAN.It mainly consists of the Heterogeneous Graph Embedding (HGE) module and the Social-aware Encoder-decoder Network (SEN) module.In the HGE module, we adopt a modified heterogeneous graph neural network, which focuses more on close friends and key historical interactions, to enhance user/item representations. In the SEN module, we use the user representation as a bridge between the Encoder and Decoder to incorporate user preferences when modeling the current session.Extensive experiments on two benchmark datasets demonstrate the superiority of SSAN over the state-of-the-art models.

2021

pdf bib abs
A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance and Self-referenced Redundancy
Wang Chen | Piji Li | Irwin King
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In recent years, reference-based and supervised summarization evaluation metrics have been widely explored. However, collecting human-annotated references and ratings are costly and time-consuming. To avoid these limitations, we propose a training-free and reference-free summarization evaluation metric. Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score. The relevance score is computed between the pseudo reference built from the source document and the given summary, where the pseudo reference content is weighted by the sentence centrality to provide importance guidance. Besides an F₁-based relevance score, we also design an F𝛽-based variant that pays more attention to the recall score. As for the redundancy score of the summary, we compute a self-masked similarity score with the summary itself to evaluate the redundant information in the summary. Finally, we combine the relevance and redundancy scores to produce the final evaluation score of the given summary. Extensive experiments show that our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation. The source code is released at https://github.com/Chen-Wang-CUHK/Training-Free-and-Ref-Free-Summ-Evaluation.

2020

pdf bib abs
Exclusive Hierarchical Decoding for Deep Keyphrase Generation
Wang Chen | Hou Pong Chan | Piji Li | Irwin King
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Keyphrase generation (KG) aims to summarize the main ideas of a document into a set of keyphrases. A new setting is recently introduced into this problem, in which, given a document, the model needs to predict a set of keyphrases and simultaneously determine the appropriate number of keyphrases to produce. Previous work in this setting employs a sequential decoding process to generate keyphrases. However, such a decoding method ignores the intrinsic hierarchical compositionality existing in the keyphrase set of a document. Moreover, previous work tends to generate duplicated keyphrases, which wastes time and computing resources. To overcome these limitations, we propose an exclusive hierarchical decoding framework that includes a hierarchical decoding process and either a soft or a hard exclusion mechanism. The hierarchical decoding process is to explicitly model the hierarchical compositionality of a keyphrase set. Both the soft and the hard exclusion mechanisms keep track of previously-predicted keyphrases within a window size to enhance the diversity of the generated keyphrases. Extensive experiments on multiple KG benchmark datasets demonstrate the effectiveness of our method to generate less duplicated and more accurate keyphrases.

2019

pdf bib abs
An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction
Wang Chen | Hou Pong Chan | Piji Li | Lidong Bing | Irwin King
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we present a novel integrated approach for keyphrase generation (KG). Unlike previous works which are purely extractive or generative, we first propose a new multi-task learning framework that jointly learns an extractive model and a generative model. Besides extracting keyphrases, the output of the extractive model is also employed to rectify the copy probability distribution of the generative model, such that the generative model can better identify important contents from the given document. Moreover, we retrieve similar documents with the given document from training data and use their associated keyphrases as external knowledge for the generative model to produce more accurate keyphrases. For further exploiting the power of extraction and retrieval, we propose a neural-based merging module to combine and re-rank the predicted keyphrases from the enhanced generative model, the extractive model, and the retrieved keyphrases. Experiments on the five KG benchmarks demonstrate that our integrated approach outperforms the state-of-the-art methods.

pdf bib abs
Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards
Hou Pong Chan | Wang Chen | Lu Wang | Irwin King
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Generating keyphrases that summarize the main points of a document is a fundamental task in natural language processing. Although existing generative models are capable of predicting multiple keyphrases for an input document as well as determining the number of keyphrases to generate, they still suffer from the problem of generating too few keyphrases. To address this problem, we propose a reinforcement learning (RL) approach for keyphrase generation, with an adaptive reward function that encourages a model to generate both sufficient and accurate keyphrases. Furthermore, we introduce a new evaluation method that incorporates name variations of the ground-truth keyphrases using the Wikipedia knowledge base. Thus, our evaluation method can more robustly evaluate the quality of predicted keyphrases. Extensive experiments on five real-world datasets of different scales demonstrate that our RL approach consistently and significantly improves the performance of the state-of-the-art generative models with both conventional and new evaluation methods.