Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation

Himanshu Dhurve; Sreedath Panat; Rajat Dandekar; Raj Dandekar

Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation

Himanshu Dhurve, Sreedath Panat, Rajat Dandekar, Raj Dandekar

Abstract

Retrieval-Augmented Generation (RAG) grounds language-model output in external knowledge, yet its application to dense technical documentation remains largely unexplored. Engineering software manuals pose compounding challenges: formulae are corrupted during PDF extraction, heterogeneous content types require different parsing treatment, and queries demand cross-document synthesis across multiple reference volumes.We present an end-to-end RAG system for OpenFOAM, an open-source computational fluid dynamics toolkit, operating in two modes. In single-query mode, a formula-preserving parser (Marker), adaptive header-aware chunking, two-stage dense-then-rerank retrieval, and a citation-enforcement prompt produce grounded, source-attributed answers across a 20-question benchmark.In report mode, a user prompt is decomposed into sub-questions via LLM planning; each sub-question undergoes independent retrieval and cross-encoder re-ranking, and the deduplicated chunk set is passed to a long-context generation call that produces a structured, multi-section report with inline citations.Evaluated on a 10-prompt golden set with a six-dimension LLM-as-a-judge framework, both pipelines achieve overall scores above 4.6/5.0 with perfect citation correctness (5.0/5.0). The decomposed pipeline demonstrates superior robustness (90% vs 70% judge success rate). Retrieval analysis using page-level ground truth reveals low absolute recall (<14%), identifying retrieval breadth as the primary bottleneck.

Anthology ID:: 2026.rag4reports-1.4
Volume:: Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
Month:: July
Year:: 2026
Address:: San Diego, CA, USA
Editors:: Eugene Yang, Dawn Lawrie, Sean MacAvaney, James Mayfield, Luca Soldaini, Andrew Yates
Venues:: RAG4Reports | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24–35
Language:
URL:: https://aclanthology.org/2026.rag4reports-1.4/
DOI:
Bibkey:
Cite (ACL):: Himanshu Dhurve, Sreedath Panat, Rajat Dandekar, and Raj Dandekar. 2026. Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation. In Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026), pages 24–35, San Diego, CA, USA. Association for Computational Linguistics.
Cite (Informal):: Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation (Dhurve et al., RAG4Reports 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.rag4reports-1.4.pdf

PDF Cite Search Fix data