Shaodi You
2018
JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features
Hongru Liang
|
Haozheng Wang
|
Jun Wang
|
Shaodi You
|
Zhe Sun
|
Jin-Mao Wei
|
Zhenglu Yang
Proceedings of the 27th International Conference on Computational Linguistics
Learning social media content is the basis of many real-world applications, including information retrieval and recommendation systems, among others. In contrast with previous works that focus mainly on single modal or bi-modal learning, we propose to learn social media content by fusing jointly textual, acoustic, and visual information (JTAV). Effective strategies are proposed to extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We also introduce cross-modal fusion and attentive pooling techniques to integrate multi-modal information comprehensively. Extensive experimental evaluation conducted on real-world datasets demonstrate our proposed model outperforms the state-of-the-art approaches by a large margin.
Search
Co-authors
- Hongru Liang 1
- Haozheng Wang 1
- Jun Wang 1
- Zhe Sun 1
- Jin-Mao Wei 1
- show all...