Shengdi Yin

2025

The proliferation of hate speech has caused significant harm to society. The intensity and directionality of hate are closely tied to the target and argument it is associated with. However, research on hate speech detection in Chinese has lagged behind, and existing datasets lack span-level fine-grained annotations. Furthermore, the lack of research on Chinese hateful slang poses a significant challenge. In this paper, we provide two valuable fine-grained Chinese hate speech detection research resources. First, we construct a Span-level Target-Aware Toxicity Extraction dataset (STATE ToxiCN), which is the first span-level Chinese hate speech dataset. Secondly, we evaluate the span-level hate speech detection performance of existing models using STATE ToxiCN. Finally, we conduct the first study on Chinese hateful slang and evaluate the ability of LLMs to understand hate semantics. Our work contributes valuable resources and insights to advance span-level hate speech detection in Chinese.

pdf bib abs

DUTJBD at SemEval-2025 Task 3: A Range of Approaches for Predicting Hallucination Generation in Models
Shengdi Yin | Zekun Wang | Liang Yang | Hongfei Lin
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper details the various methods we explored.Thank you.

pdf bib abs

Overview of CCL25-Eval Task 10: Fine-grained Chinese Hate Speech Identification Evaluation Task
Junyu Lu | Zewen Bai | Shengdi Yin | Liang Yang | Hongfei Lin
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"This paper provides an overview of the CCL25-Eval Task 10, i.e., Fine-grained Chinese Hate Speech Identification Evaluation. The primary objective of this task is to perform a fine-grained analysis of hateful samples. In addition to binary classification, systems are required to identify and extract the comment target, argument span, and the associated targeted group within each sample, thereby enhancing the model’s capability in fine-grained detection and improving the interpretability of its decisions. In total, more than 300 teams registered for the task, with 100 teams submitting valid results. We present the submitted results and provide a comprehensive analysis of the technical approaches adopted by the top-performing teams. The dataset used in this task has been available."

pdf bib abs

基于双系统推理框架的法律判决研究
Shengdi Yin | Zewen Bai | Hongfei Lin | Liang Yang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"法律判决预测是法律人工智能领域的一项重要任务。本文提出了一种基于外部知识的可解释性双系统推理框架,来解决现有方法在刑期预测任务中精度不高且可解释性不强的问题。该框架借鉴认知科学领域的双系统理论,利用大型语言模型的文本理解和生成能力,模拟人类法官处理案件时的决策过程,最终给出具有清晰推理路径的刑期预测结果。此外,通过构建一个高质量思考增强数据集和一个外部法条知识库,提升了模型的解释能力并且有效地抑制法条判断模型出现法条幻觉。实验结果表明,该框架显著提升了CAIL-small和CAIL-big数据集中刑期预测子任务上的精度和可解释性。"

Co-authors

Zekun Wang 1

Jingjie Zeng 1

Haohao Zhu 1

Venues

Fix author