Chenhao Wu
2024
Continuous Relational Diffusion Driven Topic Model with Multi-grained Text for Microblog
Chenhao Wu
|
Ruifang He
|
Chang Liu
|
Bo Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Topic model is a statistical model that leverages unsupervised learning to mine hidden topics in document collections. The data sparsity and colloquialism of social texts make it difficult to accurately mine the topics. Traditional methods assume that there are only 0/1-state relationships between the two parties in the social networks, but the relationship status in real life is more complicated, such as continuously changing relationships with different degrees of intimacy. This paper proposes a continuous relational diffusion driven topic model (CRTM) with multi-grained text for microblog to realize the continuous representation of the relationship state and make up for the context and structural information lost by previous representation methods. Multi-grained text representation learning distinguishes the impact of formal and informal expression on the topics further and alleviates colloquialism problems. Specifically, based on the original social network, the reconstructed social network with continuous relationship status is obtained by using information diffusion technology. The graph convolution model is utilized to learn node embeddings through the new social network. Finally, the neural variational inference is applied to generate topics according to continuous relationships. We validate CRTM on three real datasets, and the experimental results show the effectiveness of the scheme.