i-Code Studio: A Configurable and Composable Framework for Integrative AI

Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, Ziyi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang


Abstract
Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and effective model composition and coordination. In this paper, we propose the i-Code Studio, a configurable and composable framework for Integrative AI. The i-Code Studio orchestrates multiple pre-trained models in a finetuning-free fashion to conduct complex multimodal tasks. Instead of simple model composition, the i-Code Studio provides an integrative, flexible, and composable setting for developers to quickly and easily compose cutting-edge services and technologies tailored to their specific requirements. The i-Code Studio achieves impressive results on a variety of zero-shot multimodal tasks, such as video-to-text retrieval, speech-to-speech translation, and visual question answering. We also demonstrate how to quickly build a multimodal agent based on the i-Code Studio that can communicate and personalize for users. The project page with demonstrations and code is at https://i-code-studio.github.io/.
Anthology ID:
2024.emnlp-demo.2
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Delia Irazu Hernandez Farias, Tom Hope, Manling Li
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14–24
Language:
URL:
https://aclanthology.org/2024.emnlp-demo.2
DOI:
Bibkey:
Cite (ACL):
Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, Ziyi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, and Xuedong Huang. 2024. i-Code Studio: A Configurable and Composable Framework for Integrative AI. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 14–24, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
i-Code Studio: A Configurable and Composable Framework for Integrative AI (Fang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-demo.2.pdf