LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval

Joohyung Yun; Doyup Lee; Wook-Shin Han

LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval

Abstract

Multimodal document retrieval aims to retrieve query-relevant components from documents composed of textual, tabular, and visual elements. An effective multimodal retriever needs to handle two main challenges: (1) mitigate the effect of irrelevant contents caused by fixed, single-granular retrieval units, and (2) support multihop reasoning by effectively capturing semantic relationships among components within and across documents. To address these challenges, we propose LILaC, a multimodal retrieval framework featuring two core innovations. First, we introduce a layered component graph, explicitly representing multimodal information at two layers—each representing coarse and fine granularity—facilitating efficient yet precise reasoning. Second, we develop a late-interaction-based subgraph retrieval method, an edge-based approach that initially identifies coarse-grained nodes for efficient candidate generation, then performs fine-grained reasoning via late interaction. Extensive experiments demonstrate that LILaC achieves state-of-the-art retrieval performance on all five benchmarks, notably without additional fine-tuning. We make the artifacts publicly available at github.com/joohyung00/lilac.

Anthology ID:: 2025.emnlp-main.1037
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20551–20570
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1037/
DOI:
Bibkey:
Cite (ACL):: Joohyung Yun, Doyup Lee, and Wook-Shin Han. 2025. LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20551–20570, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval (Yun et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1037.pdf
Checklist:: 2025.emnlp-main.1037.checklist.pdf

PDF Cite Search Checklist Fix data