Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents

Dong Won Lee; Hae Won Park; Yoon Kim; Cynthia Breazeal; Louis-Philippe Morency

doi:10.18653/v1/2024.emnlp-main.881

Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents

Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency

Abstract

We describe an approach for aligning an LLM based dialogue agent for long-term social dialogue, where there is only a single global score given by the user at the end of the session. In this paper, we propose the usage of denser naturally-occurring multimodal communicative signals as local implicit feedback to improve the turn-level utterance generation. Therefore, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session level reward, using Local Implicit (LI) multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the RLHF pipeline to improve an LLM-based dialog agent. We run quantitative and qualitative human studies on two large-scale datasets to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.

Anthology ID:: 2024.emnlp-main.881
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15737–15762
Language:
URL:: https://aclanthology.org/2024.emnlp-main.881/
DOI:: 10.18653/v1/2024.emnlp-main.881
Bibkey:
Cite (ACL):: Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, and Louis-Philippe Morency. 2024. Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15737–15762, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents (Lee et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.881.pdf

PDF Cite Search Fix data