Chi Zhang


2024

pdf bib
Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li | Chi Zhang | Gang Yu | Wanqi Yang | Zhibin Wang | Bin Fu | Guosheng Lin | Chunhua Shen | Ling Chen | Yunchao Wei
Findings of the Association for Computational Linguistics: ACL 2024

The remarkable multimodal capabilities demonstrated by OpenAI’s GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions.Current methodologies often rely on annotations derived from benchmark datasets to construct image-dialogue datasets for training purposes, akin to instruction tuning in LLMs. However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models. In an effort to mitigate these limitations, we propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models to yield a diverse and controllable dataset with varied image content. This not only provides greater flexibility compared to existing methodologies but also significantly enhances several model capabilities. Our research includes comprehensive experiments conducted on various datasets using the open-source LLAVA model as a testbed for our proposed pipeline. Our results underscore marked enhancements across more than ten commonly assessed capabilities.

2022

pdf bib
Empathetic and Emotionally Positive Conversation Systems with an Emotion-specific Query-Response Memory
Zhiliang Tian | Yinliang Wang | Yiping Song | Chi Zhang | Dongkyu Lee | Yingxiu Zhao | Dongsheng Li | Nevin L. Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Emotional conversation systems generate responses for the input queries considering the speaker’s emotions in a conversation. Existing emotional conversation systems output emotional responses according to either a given emotion or the user’s emotion reflected in the input queries. Following a given emotion may lead to an emotional drift between the given emotion and the conversation state, and following only the user’s emotion may aggravate the user’s negative feelings if users suffer from a negative mood. In this paper, we propose to generate empathetic responses catering to the user’s emotions while leading the conversation to be emotionally positive. Particularly, by abstracting the conversation corpus, we extract and store the different responding strategies for different users’ emotions and conversational topics into a memory. We encourage positive emotions in conversation via a sentiment evaluator. We model the memory outputs with a Gaussian mixture distribution and sample a final responding strategy from the distribution. The strategy acts as a condition to a transformer model to generate responses. The experiments verify our model surpasses the baseline methods in appropriateness, diversity, and generating emotionally positive responses.