Sheng Chen
2025
CoTD-PO: Chain-of-Thought Distillation with Preference Optimization
Lujie Niu
|
Haochen Sun
|
Fangkun Zhao
|
Sheng Chen
|
Zimeng Bai
|
Jiawei Zhang
|
Caixia Yuan
|
Xiaojie Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Chain-of-Thought (CoT) distillation has emerged as a promising paradigm to enhance the reasoning ability of small language models by imitating the reasoning and outputs of larger teacher models. However, existing approaches suffer from a critical limitation: a distribution mismatch between teacher-generated training trajectories and the student model’s own generative distribution. This mismatch leads to exposure bias during inference and often induces mode collapse or mode averaging, thereby degrading the student model’s generative diversity and robustness. To address these issues, we propose CoTD-PO (Chain-of-Thought Distillation with Preference Optimization), a reinforcement learning framework that shifts the training paradigm from passive imitation to active trajectory exploration. Instead of forcing the student to imitate exact teacher traces, our method enables the student to sample its own answer paths. To support training with non-open-source teacher models, we approximate the teacher’s output distribution through preference-based scoring. Furthermore, we adopt an offline iterative training procedure that enables stable and efficient optimization. Experiments on diverse open-ended generation tasks demonstrate that CoTD-PO significantly outperforms standard CoT distillation baselines, achieving higher output quality while mitigating mode collapse and preserving semantic diversity.
2022
A GlobalPointer based Robust Approach for Information Extraction from Dialog Transcripts
Yanbo J. Wang
|
Sheng Chen
|
Hengxing Cai
|
Wei Wei
|
Kuo Yan
|
Zhe Sun
|
Hui Qin
|
Yuming Li
|
Xiaochen Cai
Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD)
With the widespread popularisation of intelligent technology, task-based dialogue systems (TOD) are increasingly being applied to a wide variety of practical scenarios. As the key tasks in dialogue systems, named entity recognition and slot filling play a crucial role in the completeness and accuracy of information extraction. This paper is an evaluation paper for Sere-TOD 2022 Workshop challenge (Track 1 Information extraction from dialog transcripts). We proposed a multi-model fusion approach based on GlobalPointer, combined with some optimisation tricks, finally achieved an entity F1 of 60.73, an entity-slot-value triple F1 of 56, and an average F1 of 58.37, and got the highest score in SereTOD 2022 Workshop challenge
2017
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Sheng Chen
|
Akshay Soni
|
Aasish Pappu
|
Yashar Mehdad
Proceedings of the 2nd Workshop on Representation Learning for NLP
Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec – two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple k-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods.
Search
Fix author
Co-authors
- Zimeng Bai 1
- Hengxing Cai 1
- Xiaochen Cai 1
- Yuming Li 1
- Yashar Mehdad 1
- show all...