mrCAD: Multimodal Communication to Refine Computer-aided Designs

William P McCarthy, Saujas Vaduguru, Karl D.d. Willis, Justin Matejka, Judith E Fan, Daniel Fried, Yewen Pu


Abstract
In collaborative creation tasks, people steer artifacts towards specific goals by _refining_ them with _multimodal_ communication over multiple rounds of interaction. In contrast, generative AI excels at creating artifacts in a single turn but can struggle to make precise refinements that match our design intent. To close this gap, we present mrCAD, a dataset of multi-turn interactions in which pairs of humans iteratively created and refined computer-aided designs (CADs). In each game, a _Designer sent instructions to a _Maker_, explaining how to create and subsequently refine a CAD to match a target design that only the _Designer_ could see. mrCAD consists of 6,082 communication games, 15,163 instruction-execution rounds, played between 1,092 pairs of human players. Crucially, _Designers_ had access to two communication modalities – text and drawing. Analysis finds that players relied more on text in refinement than in initial generation instructions, and used different linguistic elements for refinement than for generation. We also find that state-of-the-art VLMs are better at following generation instructions than refinement instructions. These results lay the foundation for modeling multi-turn, multimodal communication not captured in prior datasets.
Anthology ID:
2025.findings-emnlp.1248
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22905–22921
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1248/
DOI:
Bibkey:
Cite (ACL):
William P McCarthy, Saujas Vaduguru, Karl D.d. Willis, Justin Matejka, Judith E Fan, Daniel Fried, and Yewen Pu. 2025. mrCAD: Multimodal Communication to Refine Computer-aided Designs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22905–22921, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
mrCAD: Multimodal Communication to Refine Computer-aided Designs (McCarthy et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1248.pdf
Checklist:
 2025.findings-emnlp.1248.checklist.pdf