Che Zhang
2023
Mao-Zedong at SemEval-2023 Task 4: Label Represention Multi-Head Attention Model with Contrastive Learning-Enhanced Nearest Neighbor Mechanism for Multi-Label Text Classification
Che Zhang
|
Ping’an Liu
|
Zhenyang Xiao
|
Haojun Fei
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This is our system description paper for ValueEval task. The title is:Mao-Zedong At SemEval-2023 Task 4: Label Represention Multi-Head Attention Model With Contrastive Learning-Enhanced Nearest Neighbor Mechanism For Multi-Label Text Classification,and the author is Che Zhang and Pingan Liu and ZhenyangXiao and HaojunFei. In this paper, we propose a model that combinesthe label-specific attention network with the contrastive learning-enhanced nearest neighbor mechanism.
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models
Chengcheng Han
|
Xiaowei Du
|
Che Zhang
|
Yixin Lian
|
Xiang Li
|
Ming Gao
|
Baoyuan Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Chain-of-Thought (CoT) prompting has successfully enhanced the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective, or even detrimental, to the performance on reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. In this paper, we propose Dialogue-guided Chain-of-Thought (DialCoT) to improve the reasoning capabilities of SLMs, with the aim of generating intermediate reasoning steps in a dialogue format to guide the model to the final answer. Furthermore, we optimize the model to choose the optimal reasoning path through the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Compared to previous methods, our advantages lie in: 1) We transform the process of solving complex reasoning problems into decomposing problems and solving a series of simpler sub-questions, significantly reducing task difficulty and making it more suitable for SLMs. 2) We optimize the model to choose the optimal reasoning path through the PPO algorithm. Comprehensive experiments on four arithmetic reasoning datasets show that our method can achieve significant performance gains over state-of-the-art competitors.
Search
Co-authors
- Ping’an Liu 1
- Zhenyang Xiao 1
- Haojun Fei 1
- Chengcheng Han 1
- Xiaowei Du 1
- show all...