Pedro Garcia


2026

Dense retrieval is a critical component of Retrieval-Augmented Generation (RAG) systems and is highly sensitive to document representations. In consumer complaint settings, raw interaction texts are often lengthy and noisy, which limits retrieval effectiveness. This paper investigates whether schema-guided structured summaries can improve dense retrieval in RAG. We compare embeddings derived from raw interaction texts and from LLM-generated structured summaries in a controlled evaluation on Portuguese-language consumer complaints. Summary-based retrieval achieves a Recall@1 of 0.527, compared to 0.001 when indexing raw interactions, and reaches Recall@10 of 0.610, demonstrating gains of more than two orders of magnitude. These results show that structured summaries enable more effective and reliable retrieval at low cutoffs, making them particularly suitable for RAG pipelines.