Saba Nazir
2024
How Does an Adjective Sound Like? Exploring Audio Phrase Composition with Textual Embeddings
Saba Nazir
|
Mehrnoosh Sadrzadeh
Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning
We learn matrix representations for the fre- quent sound-relevant adjectives of English and compose them with vector representations of their nouns. The matrices are learnt jointly from audio and textual data, via linear regres- sion and tensor skipgram. They are assessed using an adjective similarity benchmark and also a novel adjective-noun phrase similarity dataset, applied to two tasks: semantic similar- ity and audio similarity. Joint learning via Ten- sor Skipgram (TSG) outperforms audio-only models, matrix composition outperforms addi- tion and non compositional phrase vectors.
Search