Tai-Wei Chang


2024

pdf bib
Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning
Jiahui Li | Hanlin Zhang | Fengda Zhang | Tai-Wei Chang | Kun Kuang | Long Chen | Jun Zhou
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Reinforcement learning from human feedback (RLHF) and AI-generated feedback (RLAIF) have become prominent techniques that significantly enhance the functionality of pre-trained language models (LMs). These methods harness feedback, sourced either from humans or AI, as direct rewards or to shape reward models that steer LM optimization. Nonetheless, the effective integration of rewards from diverse sources presents a significant challenge due to their disparate characteristics. To address this, recent research has developed algorithms incorporating strategies such as weighting, ranking, and constraining to handle this complexity. Despite these innovations, a bias toward disproportionately high rewards can still skew the reinforcement learning process and negatively impact LM performance. This paper explores a methodology for reward composition that enables simultaneous improvements in LMs across multiple dimensions. Inspired by fairness theory, we introduce a training algorithm that aims to reduce disparity and enhance stability among various rewards. Our method treats the aggregate reward as a dynamic weighted sum of individual rewards, with alternating updates to the weights and model parameters. For efficient and straightforward implementation, we employ an estimation technique rooted in the mirror descent method for weight updates, eliminating the need for gradient computations. The empirical results under various types of rewards across a wide range of scenarios demonstrate the effectiveness of our method.

2014

pdf bib
Interpretation of Chinese Discourse Connectives for Explicit Discourse Relation Recognition
Hen-Hsen Huang | Tai-Wei Chang | Huan-Yuan Chen | Hsin-Hsi Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Analyses of the Association between Discourse Relation and Sentiment Polarity with a Chinese Human-Annotated Corpus
Hen-Hsen Huang | Chi-Hsin Yu | Tai-Wei Chang | Cong-Kai Lin | Hsin-Hsi Chen
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse