Minnan Luo


2023

pdf bib
BotPercent: Estimating Bot Populations in Twitter Communities
Zhaoxuan Tan | Shangbin Feng | Melanie Sclar | Herun Wan | Minnan Luo | Yejin Choi | Yulia Tsvetkov
Findings of the Association for Computational Linguistics: EMNLP 2023

Twitter bot detection is vital in combating misinformation and safeguarding the integrity of social media discourse. While malicious bots are becoming more and more sophisticated and personalized, standard bot detection approaches are still agnostic to social environments (henceforth, communities) the bots operate at. In this work, we introduce community-specific bot detection, estimating the percentage of bots given the context of a community. Our method—BotPercent—is an amalgamation of Twitter bot detection datasets and feature-, text-, and graph-based models, adjusted to a particular community on Twitter. We introduce an approach that performs confidence calibration across bot detection models, which addresses generalization issues in existing community-agnostic models targeting individual bots and leads to more accurate community-level bot estimations. Experiments demonstrate that BotPercent achieves state-of-the-art performance in community-level Twitter bot detection across both balanced and imbalanced class distribution settings, presenting a less biased estimator of Twitter bot populations within the communities we analyze. We then analyze bot rates in several Twitter groups, including users who engage with partisan news media, political communities in different countries, and more. Our results reveal that the presence of Twitter bots is not homogeneous, but exhibiting a spatial-temporal distribution with considerable heterogeneity that should be taken into account for content moderation and social media policy making. The implementation of BotPercent is available at https://github.com/TamSiuhin/BotPercent.

pdf bib
Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks
Heng Wang | Wenqian Zhang | Yuyang Bai | Zhaoxuan Tan | Shangbin Feng | Qinghua Zheng | Minnan Luo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Online movie review platforms are providing crowdsourced feedback for the film industry and the general public, while spoiler reviews greatly compromise user experience. Although preliminary research efforts were made to automatically identify spoilers, they merely focus on the review content itself, while robust spoiler detection requires putting the review into the context of facts and knowledge regarding movies, user behavior on film review platforms, and more. In light of these challenges, we first curate a large-scale network-based spoiler detection dataset LCS and a comprehensive and up-to-date movie knowledge base UKM. We then propose MVSD, a novel spoiler detection model that takes into account the external knowledge about movies and user activities on movie review platforms. Specifically, MVSD constructs three interconnecting heterogeneous information networks to model diverse data sources and their multi-view attributes, while we design and employ a novel heterogeneous graph neural network architecture for spoiler detection as node-level classification. Extensive experiments demonstrate that MVSD advances the state-of-the-art on two spoiler detection datasets, while the introduction of external knowledge and user interactions help ground robust spoiler detection.

pdf bib
BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency
Zhenyu Lei | Herun Wan | Wenqian Zhang | Shangbin Feng | Zilong Chen | Jundong Li | Qinghua Zheng | Minnan Luo
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Twitter bots are automatic programs operated by malicious actors to manipulate public opinion and spread misinformation. Research efforts have been made to automatically identify bots based on texts and networks on social media. Existing methods only leverage texts or networks alone, and while few works explored the shallow combination of the two modalities, we hypothesize that the interaction and information exchange between texts and graphs could be crucial for holistically evaluating bot activities on social media. In addition, according to a recent survey (Cresci, 2020), Twitter bots are constantly evolving while advanced bots steal genuine users’ tweets and dilute their malicious content to evade detection. This results in greater inconsistency across the timeline of novel Twitter bots, which warrants more attention. In light of these challenges, we propose BIC, a Twitter Bot detection framework with text-graph Interaction and semantic Consistency. Specifically, in addition to separately modeling the two modalities on social media, BIC employs a text-graph interaction module to enable information exchange across modalities in the learning process. In addition, given the stealing behavior of novel Twitter bots, BIC proposes to model semantic consistency in tweets based on attention weights while using it to augment the decision process. Extensive experiments demonstrate that BIC consistently outperforms state-of-the-art baselines on two widely adopted datasets. Further analyses reveal that text-graph interactions and modeling semantic consistency are essential improvements and help combat bot evolution.

2022

pdf bib
KCD: Knowledge Walks and Textual Cues Enhanced Political Perspective Detection in News Media
Wenqian Zhang | Shangbin Feng | Zilong Chen | Zhenyu Lei | Jundong Li | Minnan Luo
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Political perspective detection has become an increasingly important task that can help combat echo chambers and political polarization. Previous approaches generally focus on leveraging textual content to identify stances, while they fail to reason with background knowledge or leverage the rich semantic and syntactic textual labels in news articles. In light of these limitations, we propose KCD, a political perspective detection approach to enable multi-hop knowledge reasoning and incorporate textual cues as paragraph-level labels. Specifically, we firstly generate random walks on external knowledge graphs and infuse them with news text representations. We then construct a heterogeneous information network to jointly model news content as well as semantic, syntactic and entity cues in news articles. Finally, we adopt relational graph neural networks for graph-level representation learning and conduct political perspective detection. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on two benchmark datasets. We further examine the effect of knowledge walks and textual cues and how they contribute to our approach’s data efficiency.

pdf bib
PAR: Political Actor Representation Learning with Social Context and Expert Knowledge
Shangbin Feng | Zhaoxuan Tan | Zilong Chen | Ningnan Wang | Peisheng Yu | Qinghua Zheng | Xiaojun Chang | Minnan Luo
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Modeling the ideological perspectives of political actors is an essential task in computational political science with applications in many downstream tasks. Existing approaches are generally limited to textual data and voting records, while they neglect the rich social context and valuable expert knowledge for holistic ideological analysis. In this paper, we propose PAR, a Political Actor Representation learning framework that jointly leverages social context and expert knowledge. Specifically, we retrieve and extract factual statements about legislators to leverage social context information. We then construct a heterogeneous information network to incorporate social context and use relational graph neural networks to learn legislator representations. Finally, we train PAR with three objectives to align representation learning with expert knowledge, model ideological stance consistency, and simulate the echo chamber phenomenon. Extensive experiments demonstrate that PAR is better at augmenting political text understanding and successfully advances the state-of-the-art in political perspective detection and roll call vote prediction. Further analysis proves that PAR learns representations that reflect the political reality and provide new insights into political behavior.

2020

pdf bib
基于有向异构图的发票明细税收分类方法(Tax Classification of Invoice Details Based on Directed Heterogeneous Graph)
Peiyao Zhao (赵珮瑶) | Qinghua Zheng (郑庆华) | Bo Dong (董博) | Jianfei Ruan (阮建飞) | Minnan Luo (罗敏楠)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

税收是国家赖以生存的物质基础。为加快税收现代化,方便纳税人便捷、规范开具增值税发票,国税总局规定纳税人在税控系统开票前选择发票明细对应的税收分类才可正常开具发票。提高税收分类的准确度,是构建税收风险指标和分析纳税人行为特征的重要基础。基于此,本文提出了一种基于有向异构图的短文本分类模型(Heterogeneous Directed Graph Attenton Network,HDGAT),利用发票明细间的有向信息建模,引入外部知识,显著地提高了发票明细的税收分类准确度。