Xin Yu
2025
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
Shuting Wang
|
Xin Yu
|
Mang Wang
|
Weipeng Chen
|
Yutao Zhu
|
Zhicheng Dou
Proceedings of the 31st International Conference on Computational Linguistics
Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant aspects. To tackle this important yet underexplored problem, we propose a novel RAG framework, namely RichRAG. It includes a sub-aspect explorer to identify potential sub-aspects of input questions, a multi-faceted retriever to build a candidate pool of diverse external documents related to these sub-aspects, and a generative list-wise ranker, which is a key module to provide the top-k most valuable documents for the final generator. These ranked documents sufficiently cover various query aspects and are aware of the generator’s preferences, hence incentivizing it to produce rich and comprehensive responses for users. The training of our ranker involves a supervised fine-tuning stage to ensure the basic coverage of documents, and a reinforcement learning stage to align downstream LLM’s preferences to the ranking of documents. Experimental results on two publicly available datasets prove that our framework effectively and efficiently provides comprehensive and satisfying responses to users.
2024
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
Fatemeh Shiri
|
Xiao-Yu Guo
|
Mona Golestan Far
|
Xin Yu
|
Reza Haf
|
Yuan-Fang Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Multimodal Models (LMMs) have achieved strong performance across a range of vision and language tasks. However, their spatial reasoning capabilities are under-investigated. In this paper, we construct a novel VQA dataset, Spatial-MM, to comprehensively study LMMs’ spatial understanding and reasoning capabilities. Our analyses on object-relationship and multi-hop reasoning reveal several important findings. Firstly, bounding boxes and scene graphs, even synthetic ones, can significantly enhance LMMs’ spatial reasoning. Secondly, LMMs struggle more with questions posed from the human perspective than the camera perspective about the image. Thirdly, chain of thought (CoT) prompting does not improve model performance on complex multi-hop questions involving spatial relations. Moreover, spatial reasoning steps are much less accurate than non-spatial ones across MLLMs. Lastly, our perturbation analysis on GQA-spatial reveals that LMMs are much stronger at basic object detection than complex spatial reasoning. We believe our new benchmark dataset and in-depth analyses can spark further research on LMMs spatial reasoning.
Search
Fix data
Co-authors
- Weipeng Chen 1
- Zhicheng Dou (窦志成) 1
- Mona Golestan Far 1
- Xiao-Yu Guo 1
- Reza Haf 1
- show all...