FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text

Dan DeGenaro; Eugene Yang; David Etter; Cameron Carpenter; Kate Sanders; Alexander Martin; Kenton Murray; Reno Kriz

doi:10.18653/v1/2025.magmar-1.13

FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text

Dan DeGenaro, Eugene Yang, David Etter, Cameron Carpenter, Kate Sanders, Alexander Martin, Kenton Murray, Reno Kriz

Abstract

Despite recent advancements in neural retrieval, representing text fragments or phrases with proper contextualized embeddings is still challenging. Particularly in video retrieval, where documents are text extracted through OCR from the frames or ASR from audio tracks, the textual content is rarely complete sentences but only a bag of phrases. In this work, we propose FORTIFY, a generative model fine-tuning approach for noisy document rewriting and summarization, to improve the downstream retrieval effectiveness. By experimenting on MultiVENT 2.0, an informational video retrieval benchmark, we show Llama fine-tuned with FORTIFY provides an effective document expansion, leading to a 30% improvement over prompting an out-of-box Llama model on nDCG@10. Zero-shot transferring the model tailored for MultiVENT 2.0 to two out-of-distribution datasets still demonstrates competitive retrieval effectiveness to other document preprocessing alternatives.

Anthology ID:: 2025.magmar-1.13
Volume:: Proceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Reno Kriz, Kenton Murray
Venues:: MAGMaR | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 100–115
Language:
URL:: https://aclanthology.org/2025.magmar-1.13/
DOI:: 10.18653/v1/2025.magmar-1.13
Bibkey:
Cite (ACL):: Dan DeGenaro, Eugene Yang, David Etter, Cameron Carpenter, Kate Sanders, Alexander Martin, Kenton Murray, and Reno Kriz. 2025. FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text. In Proceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025), pages 100–115, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text (DeGenaro et al., MAGMaR 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.magmar-1.13.pdf

PDF Cite Search Fix data