Jingwen Wang


2024

pdf bib
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li | Haojing Huang | Yujia Zhang | Pengfei Xu | Xi Chen | Rui Song | Lida Shi | Jingwen Wang | Hao Xu
Findings of the Association for Computational Linguistics: EMNLP 2024

Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their ability to understand the degree of preference. Extensive experiments are conducted on two widely used datasets of different tasks. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods and significantly boost their performance to achieve state-of-the-art performance. We also conduct detailed analyses to offer comprehensive insights into SPO, which verifies its effectiveness. The code is available at https://github.com/lijian16/SPO.

2021

pdf bib
Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval
Zijing Ou | Qinliang Su | Jianxing Yu | Bang Liu | Jingwen Wang | Ruihui Zhao | Changyou Chen | Yefeng Zheng
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

With the need of fast retrieval speed and small memory footprint, document hashing has been playing a crucial role in large-scale information retrieval. To generate high-quality hashing code, both semantics and neighborhood information are crucial. However, most existing methods leverage only one of them or simply combine them via some intuitive criteria, lacking a theoretical principle to guide the integration process. In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model. To deal with the complicated correlations among documents, we further propose a tree-structured approximation method for learning. Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones. Extensive experimental results on three benchmark datasets show that our method achieves superior performance over state-of-the-art methods, demonstrating the effectiveness of the proposed model for simultaneously preserving semantic and neighborhood information.

2015

pdf bib
KWB: An Automated Quick News System for Chinese Readers
Yiqi Bai | Wenjing Yang | Hao Zhang | Jingwen Wang | Ming Jia | Roland Tong | Jie Wang
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing