InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning

Zifu Wan; Yaqi Xie; Ce Zhang; Zhiqiu Lin; Zihan Wang; Simon Stepputtis; Deva Ramanan; Katia P. Sycara

doi:10.18653/v1/2025.acl-long.1179

InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning

Zifu Wan, Yaqi Xie, Ce Zhang, Zhiqiu Lin, Zihan Wang, Simon Stepputtis, Deva Ramanan, Katia P. Sycara

Abstract

Large multimodal foundation models, particularly in the domains of language and vision, have significantly advanced various tasks, including robotics, autonomous driving, information retrieval, and grounding. However, many of these models perceive objects as indivisible, overlooking the components that constitute them. Understanding these components and their associated affordances provides valuable insights into an object’s functionality, which is fundamental for performing a wide range of tasks. In this work, we introduce a novel real-world benchmark, InstructPart, comprising hand-labeled part segmentation annotations and task-oriented instructions to evaluate the performance of current models in understanding and executing part-level tasks within everyday contexts. Through our experiments, we demonstrate that task-oriented part segmentation remains a challenging problem, even for state-of-the-art Vision-Language Models (VLMs). In addition to our benchmark, we introduce a simple baseline that achieves a twofold performance improvement through fine-tuning with our dataset. With our dataset and benchmark, we aim to facilitate research on task-oriented part segmentation and enhance the applicability of VLMs across various domains, including robotics, virtual reality, information retrieval, and other related fields. Project website: https://zifuwan.github.io/InstructPart/.

Anthology ID:: 2025.acl-long.1179
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24202–24227
Language:
URL:: https://aclanthology.org/2025.acl-long.1179/
DOI:: 10.18653/v1/2025.acl-long.1179
Bibkey:
Cite (ACL):: Zifu Wan, Yaqi Xie, Ce Zhang, Zhiqiu Lin, Zihan Wang, Simon Stepputtis, Deva Ramanan, and Katia P. Sycara. 2025. InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24202–24227, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning (Wan et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1179.pdf

PDF Cite Search Fix data