Jiasheng Si

2025

Plan Dynamically, Express Rhetorically: A Debate-Driven Rhetorical Framework for Argumentative Writing
Xueguan Zhao | Wenpeng Lu | Chaoqun Zheng | Weiyu Zhang | Jiasheng Si | Deyu Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Argumentative essay generation (AEG) is a complex task that requires advanced semantic understanding, logical reasoning, and organized integration of perspectives. Despite showing a promising performance, current efforts often overlook the dynamical and hierarchical nature of structural argumentative planning, and struggle with flexible rhetorical expression, leading to limited argument divergence and rhetorical optimization. Inspired by human debate behavior and Bitzer’s rhetorical situation theory, we propose a debate-driven rhetorical framework for argumentative writing. The uniqueness lies in three aspects: (1) dynamic assesses the divergence of viewpoints and progressively reveals the hierarchical outline of arguments based on a depth-then-breadth paradigm, improving the perspective divergence within argumentation; (2) simulates human debate through iterative defender-attacker interactions, improving the logical coherence of arguments; (3) incorporates Bitzer’s rhetorical situation theory to flexibly select appropriate rhetorical techniques, enabling the rhetorical expression. Experiments on four benchmarks validate that our approach significantly improves logical depth, argumentative diversity, and rhetorical persuasiveness over existing state-of-the-art models.

pdf bib abs

ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts
Ruiran Su | Jiasheng Si | Zhijiang Guo | Janet B. Pierrehumbert
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Scientific fact-checking has largely focused on textual and tabular sources, neglecting scientific charts—a primary medium for conveying quantitative evidence and supporting statistical reasoning in research communication. We introduce ClimateViz, the first large-scale benchmark for scientific fact-checking grounded in real-world, expert-curated scientific charts. ClimateViz comprises 49,862 claims paired with 2,896 visualizations, each labeled as support, refute, or not enough information. To enable interpretable verification, each instance includes structured knowledge graph explanations that capture statistical patterns, temporal trends, spatial comparisons, and causal relations. We conduct a comprehensive evaluation of state-of-the-art multimodal large language models, including proprietary and open-source ones, under zero-shot and few-shot settings. Our results show that current models struggle to perform fact-checking when statistical reasoning over charts is required: even the best-performing systems, such as Gemini 2.5 and InternVL 2.5, achieve only 76.2–77.8% accuracy in label-only output settings, which is far below human performance (89.3% and 92.7%). While few-shot prompting yields limited improvements, explanation-augmented outputs significantly enhance performance in some closed-source models, notably o3 and Gemini 2.5.

2024

pdf bib abs

Denoising Rationalization for Multi-hop Fact Verification via Multi-granular Explainer
Jiasheng Si | Yingjie Zhu | Wenpeng Lu | Deyu Zhou
Findings of the Association for Computational Linguistics: EMNLP 2024

The success of deep learning models on multi-hop fact verification has prompted researchers to understand the behavior behind their veracity. One feasible way is erasure search: obtaining the rationale by entirely removing a subset of input without compromising verification accuracy. Despite extensive exploration, current rationalization methods struggle to discern nuanced composition within the correlated evidence, which inevitably leads to noise rationalization in multi-hop scenarios. To address this issue, this paper explores the multi-granular rationale extraction method, aiming to realize the denoising rationalization for multi-hop fact verification. Specifically, given a pretrained veracity prediction model, two independent external explainers are introduced and trained collaboratively to enhance the discriminating ability by imposing varied constraints. Meanwhile, three key properties (Fidelity, Consistency, Salience) are introduced to regularize the denoising and faithful rationalization process. Additionally, a new Noiselessness metric is proposed to measure the purity of the rationales. Experimental results on three multi-hop fact verification datasets show that the proposed approach outperforms 12 baselines.

pdf bib abs

Chain-of-Thought (CoT) has become a vital technique for enhancing the performance of Large Language Models (LLMs), attracting increasing attention from researchers. One stream of approaches focuses on the iterative enhancement of LLMs by continuously verifying and refining their reasoning outputs for desired quality. Despite its impressive results, this paradigm faces two critical issues: (1) Simple verification methods: The current paradigm relies solely on a single verification method. (2) Wrong Information Ignorance: Traditional paradigms directly ignore wrong information during reasoning and refine the logic paths from scratch each time. To address these challenges, we propose Wrong-of-Thought (WoT), which includes two core modules: (1) Multi-Perspective Verification: A multi-perspective verification method for accurately refining the reasoning process and result, and (2) Wrong Information Utilization: Utilizing wrong information to alert LLMs and reduce the probability of LLMs making same mistakes. Experiments on 8 popular datasets and 5 LLMs demonstrate that WoT surpasses all previous baselines. In addition, WoT exhibits powerful capabilities in difficult computation tasks.

pdf bib abs

With the growing complexity of fact verification tasks, the concern with “thoughtful” reasoning capabilities is increasing. However, recent fact verification benchmarks mainly focus on checking a narrow scope of semantic factoids within claims and lack an explicit logical reasoning process. In this paper, we introduce CHECKWHY, a challenging dataset tailored to a novel causal fact verification task: checking the truthfulness of the causal relation within claims through rigorous reasoning steps. CHECKWHY consists of over 19K “why” claim-evidence- argument structure triplets with supports, refutes, and not enough info labels. Each argument structure is composed of connected evidence, representing the reasoning process that begins with foundational evidence and progresses toward claim establishment. Through extensive experiments on state-of-the-art models, we validate the importance of incorporating the argument structure for causal fact verification. Moreover, the automated and human evaluation of argument structure generation reveals the difficulty in producing satisfying argument structure by fine-tuned models or Chain-of-Thought prompted LLMs, leaving considerable room for future improvements.

pdf bib abs

Medical entity disambiguation (MED) aims to ground medical mentions in text with ontological entities in knowledge bases (KBs). A notable challenge of MED is the long medical text usually contains multiple entities’ mentions with intricate correlations. However, limited by computation overhead, many existing methods consider only a single candidate entity mention during the disambiguation process. As such, they focus only on local MED optimal while ignoring the sole-mention disambiguation possibly boosted by richer context from other mentions’ disambiguating processes – missing global optimal on entity combination in the text. Motivated by this, we propose a new approach called Extractive Medical Entity Disambiguation with Memory Mechanism and Memorized Entity Information (M3E). Specifically, we reformulate MED as a text extraction task, which simultaneously accepts the context of medical mentions, all possible candidate entities, and entity definitions, and it is then trained to extract the text span corresponding to the correct entity. Upon our new formulation, 1) to alleviate the computation overhead from the enriched context, we devise a memory mechanism module that performs memory caching, retrieval, fusion and cross-network residual; and 2) to utilize the disambiguation clues from other mentions, we design an auxiliary disambiguation module that employs a gating mechanism to assist the disambiguation of remaining mentions. Extensive experiments on two benchmark datasets demonstrate the superiority of M3E over the state-of-the-art MED methods on all metrics.

2023

pdf bib abs

EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification
Yingjie Zhu | Jiasheng Si | Yibo Zhao | Haiyang Zhu | Deyu Zhou | Yulan He
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Automatic multi-hop fact verification task has gained significant attention in recent years. Despite impressive results, these well-designed models perform poorly on out-of-domain data. One possible solution is to augment the training data with counterfactuals, which are generated by minimally altering the causal features of the original data. However, current counterfactual data augmentation techniques fail to handle multi-hop fact verification due to their incapability to preserve the complex logical relationships within multiple correlated texts. In this paper, we overcome this limitation by developing a rationale-sensitive method to generate linguistically diverse and label-flipping counterfactuals while preserving logical relationships. In specific, the diverse and fluent counterfactuals are generated via an Explain-Edit-Generate architecture. Moreover, the checking and filtering modules are proposed to regularize the counterfactual data with logical relations and flipped labels. Experimental results show that the proposed approach outperforms the SOTA baselines and can generate linguistically diverse counterfactual data without disrupting their logical relationships.

2021

pdf bib abs

Topic-Aware Evidence Reasoning and Stance-Aware Aggregation for Fact Verification
Jiasheng Si | Deyu Zhou | Tongzhe Li | Xingyu Shi | Yulan He
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Fact verification is a challenging task that requires simultaneously reasoning and aggregating over multiple retrieved pieces of evidence to evaluate the truthfulness of a claim. Existing approaches typically (i) explore the semantic interaction between the claim and evidence at different granularity levels but fail to capture their topical consistency during the reasoning process, which we believe is crucial for verification; (ii) aggregate multiple pieces of evidence equally without considering their implicit stances to the claim, thereby introducing spurious information. To alleviate the above issues, we propose a novel topic-aware evidence reasoning and stance-aware aggregation model for more accurate fact verification, with the following four key properties: 1) checking topical consistency between the claim and evidence; 2) maintaining topical coherence among multiple pieces of evidence; 3) ensuring semantic similarity between the global topic information and the semantic representation of evidence; 4) aggregating evidence based on their implicit stances to the claim. Extensive experiments conducted on the two benchmark datasets demonstrate the superiority of the proposed model over several state-of-the-art approaches for fact verification. The source code can be obtained from https://github.com/jasenchn/TARSA.