2025
pdf
bib
abs
An Automatic Method to Estimate Correctness of RAG
Chi Zhang
|
Vivek V. Datla
|
Aditya Shrivastava
|
Alfy Samuel
|
Zhiqi Huang
|
Anoop Kumar
|
Daben Liu
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
In sectors in where data quality is critical, like finance and healthcare, it is crucial to have confidence in not only the outputs generated by retrieval-augmented generation (RAG) models but also the process followed by the model while arriving at the output. Existing methods, such as hallucination detection and input-output entailment measurements, fail to capture the model’s internal state during answer generation. This paper introduces a novel approach to predict the correctness of the generated answer by modeling the model’s uncertainty on quantified perturbations of input. Extensive experiments across multiple large language models (LLMs) demonstrate that our approach quantifies RAG robustness by aligning predictions with ground truth with a Avg.Mean Square Error (MSE) 0.002 while offering flexibility for diverse qualitative metrics.
2024
pdf
bib
abs
Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li
|
Chi Zhang
|
Gang Yu
|
Wanqi Yang
|
Zhibin Wang
|
Bin Fu
|
Guosheng Lin
|
Chunhua Shen
|
Ling Chen
|
Yunchao Wei
Findings of the Association for Computational Linguistics: ACL 2024
The remarkable multimodal capabilities demonstrated by OpenAI’s GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions.Current methodologies often rely on annotations derived from benchmark datasets to construct image-dialogue datasets for training purposes, akin to instruction tuning in LLMs. However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models. In an effort to mitigate these limitations, we propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models to yield a diverse and controllable dataset with varied image content. This not only provides greater flexibility compared to existing methodologies but also significantly enhances several model capabilities. Our research includes comprehensive experiments conducted on various datasets using the open-source LLAVA model as a testbed for our proposed pipeline. Our results underscore marked enhancements across more than ten commonly assessed capabilities.
2022
pdf
bib
abs
Empathetic and Emotionally Positive Conversation Systems with an Emotion-specific Query-Response Memory
Zhiliang Tian
|
Yinliang Wang
|
Yiping Song
|
Chi Zhang
|
Dongkyu Lee
|
Yingxiu Zhao
|
Dongsheng Li
|
Nevin L. Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022
Emotional conversation systems generate responses for the input queries considering the speaker’s emotions in a conversation. Existing emotional conversation systems output emotional responses according to either a given emotion or the user’s emotion reflected in the input queries. Following a given emotion may lead to an emotional drift between the given emotion and the conversation state, and following only the user’s emotion may aggravate the user’s negative feelings if users suffer from a negative mood. In this paper, we propose to generate empathetic responses catering to the user’s emotions while leading the conversation to be emotionally positive. Particularly, by abstracting the conversation corpus, we extract and store the different responding strategies for different users’ emotions and conversational topics into a memory. We encourage positive emotions in conversation via a sentiment evaluator. We model the memory outputs with a Gaussian mixture distribution and sample a final responding strategy from the distribution. The strategy acts as a condition to a transformer model to generate responses. The experiments verify our model surpasses the baseline methods in appropriateness, diversity, and generating emotionally positive responses.