From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs

Shenshen Li; Wenxin Meng; Lei Wang; Hao Yang; Chong Peng; Peng Yan; Fumin Shen; Jingkuan Song; Heng Tao Shen; Xing Xu

doi:10.18653/v1/2025.findings-acl.979

From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs

Shenshen Li, Wenxin Meng, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Jingkuan Song, Heng Tao Shen, Xing Xu

Abstract

Recent progress in large vision-language models (LVLMs) has shown substantial potential across a broad spectrum of third-person tasks. However, adapting these LVLMs to egocentric scenarios remains challenging due to their third-person training bias. Existing methods that adapt LVLMs for first-person tasks often overlook critical agent-environment interactions, limiting their ability to perform egocentric reasoning. To address these challenges, we propose a novel zero-shot paradigm termed Front-Door Adjustments with Uncertainty Calibration (FRUIT) to enhance the egocentric reasoning abilities of LVLMs by simulating human causal reasoning. Specifically, the FRUIT operates in two stages: observation and understanding. Unlike conventional prompting techniques, we formalize egocentric reasoning using a structural causal model. Then, we ground interaction regions and expand them into hierarchical visual cues, augmented with corresponding captions, to form the initial observations. To reduce noise in these observations, we employ uncertainty calibration to filter out unreliable information. These refined observations as mediators are then incorporated into the prompt template, guiding the model to understand semantics from a first-person perspective. Extensive experiments conducted on the EgoThink benchmark demonstrate that our FRUIT method consistently enhances the performance of existing LVLMs on six distinct tasks. Our code is available at https://github.com/Mrshenshen/FRUIT.

Anthology ID:: 2025.findings-acl.979
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19152–19169
Language:
URL:: https://aclanthology.org/2025.findings-acl.979/
DOI:: 10.18653/v1/2025.findings-acl.979
Bibkey:
Cite (ACL):: Shenshen Li, Wenxin Meng, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Jingkuan Song, Heng Tao Shen, and Xing Xu. 2025. From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19152–19169, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs (Li et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.979.pdf

PDF Cite Search Fix data