Qi Zhang

Other people with similar names: Qi Zhang, Qi Zhang, Qi Zhang, Qi Zhang, Qi Zhang, Qi Zhang, Qi Zhang

Unverified author pages with similar names: Qi Zhang

2026

Retrieval-augmented generation (RAG) has become a widely adopted paradigm for realistic financial analysis over financial documents. However, existing benchmarks fail to capture realistic financial analysis settings that involve cross-document retrieval, multi-page evidence integration, and diverse analytical tasks. To address this gap, we introduce FinMRAGBench, a comprehensive multi-modal financial RAG benchmark in which most questions require retrieving evidence scattered across multiple pages and documents, constructed from large-scale real-world annual reports and comprising 887 expert-verified QA pairs spanning five representative financial analysis tasks. Moreover, we introduce FinMRAGAgent, an agent trained on high-quality agentic trajectories following the reasoning-and-acting (ReAct) paradigm, capable of dynamic tool invocation and multi-step financial analysis. Our extensive experiments show that current multi-modal RAG systems still struggle with incomplete retrieval and complex financial reasoning. In contrast, FinMRAGAgent achieves the strongest overall performance across all models, demonstrating that our structured reasoning approach significantly enhances multi-modal RAG in realistic financial scenarios. The code and data are available at https://github.com/sqyangit/FinMRAGBench.

pdf bib abs

With the rapid progress of large language models (LLMs), aligning a general-purpose model with downstream tasks through fine-tuning has become a central research focus. Selecting only high-quality examples for training has been shown to be one of the most effective ways to improve fine-tuning performance. However, prior work concentrates almost exclusively on data preprocessing: filtering and cleaning data before training begins. While the order and composition of training data during training have received little fine-grained attention. To fill this gap, our work proposed Fine-Grained Order Fine-Tuning, a fine-grained scheduling method of data order in epochs. Drawing on curriculum-learning principles, FOT defines data difficulty based on the relevance between the data and the model, and then performs dynamic scheduling of the training order in each epoch according to the difficulty. On both large-scale continued pre-training and small-scale supervised fine-tuning experiments, FOT has achieved an average 2.4% improvement over baselines. Our study offers a new perspective on data governance in the fine-tuning phase.

pdf bib abs

Tabular data is widely used in fields such as finance and healthcare. Traditional tree-based models are prevalent for tabular prediction tasks due to their ability to handle heterogeneous features. However, their heavy reliance on feature engineering limits both their generalizability and their human-readable interpretability. On the other hand, Large Language Models (LLMs) naturally provide intermediate reasoning steps, thus offering greater transparency in decision-making. Nevertheless, LLMs often fail to match the predictive performance of tree-based models on tabular data. To address these challenges, we propose a novel Logic-Graph-Enhanced LLM Reasoning (LogGER) framework that integrates the strengths of tree-based models and LLMs. Specifically, we reformulate the traditional decision tree as a human-readable logic graph, which explicitly models the causal relationships between features and targets. This logic graph is automatically constructed using LLMs based on data priors and serves as the foundation for LogGER. To fully leverage the logic graph, we further introduce a logic-graph-guided process supervision approach, which evaluates and enhances the quality of LLM’s intermediate reasoning steps using logic-graph-aided process reward. Extensive experiments demonstrate that LogGER consistently outperforms both tree-based models and state-of-the-art LLM methods on a variety of tabular prediction tasks, achieving superior accuracy and interpretability.