Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training

Zhou Zhang, Dongzeng Tan, Jiaan Wang, Yilong Chen, Jiarong Xu


Abstract
Emojis have gained immense popularity on social platforms, serving as a common means to supplement or replace text. However, existing data mining approaches generally either completely ignore or simply treat emojis as ordinary Unicode characters, which may limit the model’s ability to grasp the rich semantic information in emojis and the interaction between emojis and texts. Thus, it is necessary to release the emoji’s power in social media data mining. To this end, we first construct a heterogeneous graph consisting of three types of nodes, i.e. post, word and emoji nodes to improve the representation of different elements in posts. The edges are also well-defined to model how these three elements interact with each other. To facilitate the sharing of information among post, word and emoji nodes, we propose a graph pre-train framework for text and emoji co-modeling, which contains two graph pre-training tasks: node-level graph contrastive learning and edge-level link reconstruction learning. Extensive experiments on the Xiaohongshu and Twitter datasets with two types of downstream tasks demonstrate that our approach proves significant improvement over previous strong baseline methods.
Anthology ID:
2024.emnlp-main.989
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17851–17863
Language:
URL:
https://aclanthology.org/2024.emnlp-main.989
DOI:
Bibkey:
Cite (ACL):
Zhou Zhang, Dongzeng Tan, Jiaan Wang, Yilong Chen, and Jiarong Xu. 2024. Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17851–17863, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training (Zhang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.989.pdf