RE-FIN: Retrieval-based Enrichment for Financial data

Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Filippo Pallucchini


Abstract
Enriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labeled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.
Anthology ID:
2025.coling-industry.62
Volume:
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
751–759
Language:
URL:
https://aclanthology.org/2025.coling-industry.62/
DOI:
Bibkey:
Cite (ACL):
Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, and Filippo Pallucchini. 2025. RE-FIN: Retrieval-based Enrichment for Financial data. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 751–759, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
RE-FIN: Retrieval-based Enrichment for Financial data (Malandri et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-industry.62.pdf