Dong Won Lee

Also published as: Dong Won Lee

2025

Aligning Dialogue Agents with Global Feedback via Large Language Model Multimodal Reward Decomposition
Dong Won Lee | Hae Won Park | Cynthia Breazeal | Louis-Philippe Morency
Findings of the Association for Computational Linguistics: EMNLP 2025

We propose a large language model based reward decomposition framework for aligning dialogue agents using only a single session-level feedback signal. We leverage the reasoning capabilities of a frozen, pretrained large language model (LLM) to infer fine-grained local implicit rewards by decomposing global, session-level feedback. Our first text-only variant prompts the LLM to perform reward decomposition using only the dialogue transcript. The second multimodal variant incorporates additional behavioral cues, such as pitch, gaze, and facial affect, expressed as natural language descriptions. These inferred turn-level rewards are distilled into a lightweight reward model, which we utilize for RL-based fine-tuning for dialogue generation. We evaluate both text-only and multimodal variants against state-of-the-art reward decomposition methods and demonstrate notable improvements in human evaluations of conversation quality, suggesting that LLMs are strong reward decomposers that obviate the need for manual reward shaping and granular human feedback.

pdf bib abs

Does “Reasoning” with Large Language Models Improve Recognizing, Generating and Reframing Unhelpful Thoughts?
Yilin Qi | Dong Won Lee | Cynthia Breazeal | Hae Won Park
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)

Cognitive Reframing, a core element of Cognitive Behavioral Therapy (CBT), helps individuals reinterpret negative experiences by finding positive meaning. Recent advances in Large Language Models (LLMs) have demonstrated improved performance through reasoning-based strategies. This inspires a promising direction of leveraging the reasoning capabilities of LLMs to improve CBT and mental reframing by simulating the process of critical thinking, potentially enabling more effective recognition, generation and reframing of cognitive distortions. In this work, we investigate the role of various reasoning methods, including pre-trained reasoning LLMs, such as DeepSeek-R1, and augmented reasoning strategies, such as CoT (Wei et al., 2022) and self-consistency (Wang et al., 2022), in enhancing LLMs’ ability to perform cognitive reframing tasks. We find that augmented reasoning methods, even when applied to older LLMs like GPT-3.5, consistently outperform state-of- the-art pretrained reasoning models such as DeepSeek-R1 (Guo et al., 2025) and o1 (Jaech et al., 2024) on recognizing, generating and reframing unhelpful thoughts.

2024

pdf bib abs

Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents
Dong Won Lee | Hae Won Park | Yoon Kim | Cynthia Breazeal | Louis-Philippe Morency
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We describe an approach for aligning an LLM based dialogue agent for long-term social dialogue, where there is only a single global score given by the user at the end of the session. In this paper, we propose the usage of denser naturally-occurring multimodal communicative signals as local implicit feedback to improve the turn-level utterance generation. Therefore, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session level reward, using Local Implicit (LI) multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the RLHF pipeline to improve an LLM-based dialog agent. We run quantitative and qualitative human studies on two large-scale datasets to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.

2020

pdf bib abs

No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures
Chaitanya Ahuja | Dong Won Lee | Ryo Ishii | Louis-Philippe Morency
Findings of the Association for Computational Linguistics: EMNLP 2020

We study relationships between spoken language and co-speech gestures in context of two key challenges. First, distributions of text and gestures are inherently skewed making it important to model the long tail. Second, gesture predictions are made at a subword level, making it important to learn relationships between language and acoustic cues. We introduce AISLe, which combines adversarial learning with importance sampling to strike a balance between precision and coverage. We propose the use of a multimodal multiscale attention block to perform subword alignment without the need of explicit alignment between language and acoustic cues. Finally, to empirically study the importance of language in this task, we extend the dataset proposed in Ahuja et al. (2020) with automatically extracted transcripts for audio signals. We substantiate the effectiveness of our approach through large-scale quantitative and user studies, which show that our proposed methodology significantly outperforms previous state-of-the-art approaches for gesture generation. Link to code, data and videos: https://github.com/chahuja/aisle

Co-authors

Yoon Kim 1

Yilin Qi 1

Venues

Fix author