Modeling Turn-Taking with Semantically Informed Gestures

Varsha Suresh; M. Hamza Mughal; Christian Theobalt; Vera Demberg

Modeling Turn-Taking with Semantically Informed Gestures

Varsha Suresh, M. Hamza Mughal, Christian Theobalt, Vera Demberg

Abstract

In conversation, humans use multimodal cues, such as speech, gestures, and gaze, to manage turn-taking. While linguistic and acoustic features are informative, gestures provide complementary cues for modeling these transitions. To study this, we introduce DnD Gesture++, an extension of the multi-party DnD Gesture corpus enriched with 2,663 semantic gesture annotations spanning iconic, metaphoric, deictic, and discourse types. Using this dataset, we model turn-taking prediction through a Mixture-of-Experts framework integrating text, audio, and gestures. Experiments show that incorporating semantically guided gestures yields consistent performance gains over baselines, demonstrating their complementary role in multimodal turn-taking.

Anthology ID:: 2026.findings-eacl.106
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2034–2041
Language:
URL:: https://aclanthology.org/2026.findings-eacl.106/
DOI:
Bibkey:
Cite (ACL):: Varsha Suresh, M. Hamza Mughal, Christian Theobalt, and Vera Demberg. 2026. Modeling Turn-Taking with Semantically Informed Gestures. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2034–2041, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Modeling Turn-Taking with Semantically Informed Gestures (Suresh et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.106.pdf
Checklist:: 2026.findings-eacl.106.checklist.pdf

PDF Cite Search Checklist Fix data