Lili Shan


2023

pdf bib
融合文本困惑度特征和相似度特征的推特机器人检测方法∗(Twitter robot detection method based on text perplexity feature and similarity feature)
Zhongjie Wang (王钟杰) | ZZhaowen Zhang (张朝文) | Wenqi Ding (丁文琪) | Yumeng Fu (付雨濛) | Lili Shan (单丽莉) | Bingquan Liu (刘秉权)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“推特机器人检测任务的目标是判断一个推特账号是真人账号还是自动化机器人账号。随着自动化账号拟人算法的快速迭代,检测最新类别的自动化账号变得越来越困难。最近,预训练语言模型在自然语言生成任务和其他任务上表现出了出色的水平,当这些预训练语言模型被用于推特文本自动生成时,会为推特机器人检测任务带来很大挑战。本文研究发现,困惑度偏低和相似度偏高的现象始终出现在不同时代自动化账号的历史推文中,且此现象不受预训练语言模型的影响。针对这些发现,本文提出了一种抽取历史推文困惑度特征和相似度特征的方法,并设计了一种特征融合策略,以更好地将这些新特征应用于已有的算法模型。本文方法在选定数据集上的性能超越了已有的基准方法,并在人民网主办、传播内容认知全国重点实验室承办的社交机器人识别大赛上取得了冠军。”

2021

pdf bib
ITNLP at SemEval-2021 Task 11: Boosting BERT with Sampling and Adversarial Training for Knowledge Extraction
Genyu Zhang | Yu Su | Changhong He | Lei Lin | Chengjie Sun | Lili Shan
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes the winning system in the End-to-end Pipeline phase for the NLPContributionGraph task. The system is composed of three BERT-based models and the three models are used to extract sentences, entities and triples respectively. Experiments show that sampling and adversarial training can greatly boost the system. In End-to-end Pipeline phase, our system got an average F1 of 0.4703, significantly higher than the second-placed system which got an average F1 of 0.3828.