Read Before Grounding: Scene Knowledge Visual Grounding via Multi-step Parsing

Read Before Grounding: Scene Knowledge Visual Grounding via Multi-step Parsing HaiXiang Zhu author Lixian Su author ShuangMing Mao author Jing Ye author 2025-01 text Proceedings of the 31st International Conference on Computational Linguistics Owen Rambow editor Leo Wanner editor Marianna Apidianaki editor Hend Al-Khalifa editor Barbara Di Eugenio editor Steven Schockaert editor Association for Computational Linguistics Abu Dhabi, UAE conference publication zhu-etal-2025-read https://aclanthology.org/2025.coling-main.76/ 2025-01 1136 1149