What is Multimodality?

Letitia Parcalabescu, Nils Trost, Anette Frank


Abstract
The last years have shown rapid developments in the field of multimodal machine learning, combining e.g., vision, text or speech. In this position paper we explain how the field uses outdated definitions of multimodality that prove unfit for the machine learning era. We propose a new task-relative definition of (multi)modality in the context of multimodal machine learning that focuses on representations and information that are relevant for a given machine learning task. With our new definition of multimodality we aim to provide a missing foundation for multimodal research, an important component of language grounding and a crucial milestone towards NLU.
Anthology ID:
2021.mmsr-1.1
Volume:
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)
Month:
June
Year:
2021
Address:
Groningen, Netherlands (Online)
Editors:
Lucia Donatelli, Nikhil Krishnaswamy, Kenneth Lai, James Pustejovsky
Venue:
MMSR
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2021.mmsr-1.1
DOI:
Bibkey:
Cite (ACL):
Letitia Parcalabescu, Nils Trost, and Anette Frank. 2021. What is Multimodality?. In Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR), pages 1–10, Groningen, Netherlands (Online). Association for Computational Linguistics.
Cite (Informal):
What is Multimodality? (Parcalabescu et al., MMSR 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mmsr-1.1.pdf