Minjie Qiang
2025
Exploring Unified Training Framework for Multimodal User Profiling
Minjie Qiang
|
Zhongqing Wang
|
Shoushan Li
|
Guodong Zhou
Proceedings of the 31st International Conference on Computational Linguistics
With the emergence of social media and e-commerce platforms, accurate user profiling has become increasingly vital for recommendation systems and personalized services. Recent studies have focused on generating detailed user profiles by extracting various aspects of user attributes from textual reviews. Nevertheless, these investigations have not fully exploited the potential of the abundant multimodal data at hand. In this study, we propose a novel task called multimodal user profiling. This task emphasizes the utilization of both review texts and their accompanying images to create comprehensive user profiles. By integrating textual and visual data, we leverage their complementary strengths, enabling the generation of more holistic user representations. Additionally, we explore a unified joint training framework with various multimodal training strategies that incorporate users’ historical review texts and images for user profile generation. Our experimental results underscore the significance of multimodal data in enhancing user profile generation and demonstrate the effectiveness of the proposed unified joint training approach.
2024
Employing Glyphic Information for Chinese Event Extraction with Vision-Language Model
Xiaoyi Bao
|
Jinghang Gu
|
Zhongqing Wang
|
Minjie Qiang
|
Chu-Ren Huang
Findings of the Association for Computational Linguistics: EMNLP 2024
As a complex task that requires rich information input, features from various aspects have been utilized in event extraction. However, most of the previous works ignored the value of glyph, which could contain enriched semantic information and can not be fully expressed by the pre-trained embedding in hieroglyphic languages like Chinese. We argue that, compared with combining the sophisticated textual features, glyphic information from visual modality could provide us with extra and straight semantic information in extracting events. Motivated by this, we propose a glyphic multi-modal Chinese event extraction model with hieroglyphic images to capture the intra- and inter-character morphological structure from the sequence. Extensive experiments build a new state-of-the-art performance in the ACE2005 Chinese and KBP Eval 2017 dataset, which underscores the effectiveness of our proposed glyphic event extraction model, and more importantly, the glyphic feature can be obtained at nearly zero cost.
Search
Fix data
Co-authors
- Zhongqing Wang (王中卿) 2
- Xiaoyi Bao 1
- Jinghang Gu 1
- Chu-Ren Huang 1
- Shoushan Li (李寿山) 1
- show all...