How Does an Adjective Sound Like? Exploring Audio Phrase Composition with Textual Embeddings

Saba Nazir, Mehrnoosh Sadrzadeh


Abstract
We learn matrix representations for the fre- quent sound-relevant adjectives of English and compose them with vector representations of their nouns. The matrices are learnt jointly from audio and textual data, via linear regres- sion and tensor skipgram. They are assessed using an adjective similarity benchmark and also a novel adjective-noun phrase similarity dataset, applied to two tasks: semantic similar- ity and audio similarity. Joint learning via Ten- sor Skipgram (TSG) outperforms audio-only models, matrix composition outperforms addi- tion and non compositional phrase vectors.
Anthology ID:
2024.clasp-1.3
Volume:
Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning
Month:
October
Year:
2024
Address:
Gothenburg, Sweden
Editors:
Amy Qiu, Bill Noble, David Pagmar, Vladislav Maraev, Nikolai Ilinykh
Venue:
CLASP
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
13–18
Language:
URL:
https://aclanthology.org/2024.clasp-1.3
DOI:
Bibkey:
Cite (ACL):
Saba Nazir and Mehrnoosh Sadrzadeh. 2024. How Does an Adjective Sound Like? Exploring Audio Phrase Composition with Textual Embeddings. In Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning, pages 13–18, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
How Does an Adjective Sound Like? Exploring Audio Phrase Composition with Textual Embeddings (Nazir & Sadrzadeh, CLASP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clasp-1.3.pdf