IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning

Seungwhan Moon; Andrea Madotto; Zhaojiang Lin; Aparajita Saraf; Amy Bearman; Babak Damavandi

doi:10.18653/v1/2023.findings-emnlp.883

IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning

Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Aparajita Saraf, Amy Bearman, Babak Damavandi

Abstract

We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with text and video, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to translate human motions (as measured by IMU sensors) into their corresponding textual descriptions and videos – while preserving the transitivity across these modalities. We introduce several new IMU-based Wearable AI applications such as motion-based media search, or an LM-based multimodal reasoning with motion sensor data – all using text as the grounding platform. In addition, we show that IMU2CLIP significantly improves downstream performances when fine-tuned for each application, demonstrating its universal usage as a new pre-trained resource. Our code and models will be released publicly.

Anthology ID:: 2023.findings-emnlp.883
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13246–13253
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.883/
DOI:: 10.18653/v1/2023.findings-emnlp.883
Bibkey:
Cite (ACL):: Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Aparajita Saraf, Amy Bearman, and Babak Damavandi. 2023. IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13246–13253, Singapore. Association for Computational Linguistics.
Cite (Informal):: IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning (Moon et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.883.pdf

PDF Cite Search Fix data