Huijia Li

2026

Legal case facts are often lengthy, complex, and difficult to process, posing challenges for legal judgment prediction. Although recent advances leverage large language models (LLMs) for legal reasoning, they face high computational costs and information degradation when handling long cases. Previous approaches, such as architectural modifications and text compression methods, reduce computational complexity to some extent but still struggle to effectively capture legally salient information in complex cases. We propose a legal knowledge–adaptive compression framework for long legal judgment prediction that integrates domain-specific legal knowledge to guide adaptive context compression. Our approach selectively retains legally relevant information while reducing redundant or less informative content, enabling efficient and accurate long-context reasoning. We evaluate the proposed framework on four real-world datasets spanning multiple jurisdictions and languages. Experimental results demonstrate that our method outperforms existing approaches in both prediction performance and computational efficiency.

pdf bib abs

As the real world continuously evolves, temporal facts change over time, requiring large language models to simultaneously rely on internal parametric knowledge and externally retrieved evidence for temporal reasoning. However, external knowledge may be inaccurate, while internal knowledge can become outdated. Temporal inconsistencies between these heterogeneous sources can accumulate during multi-step reasoning, leading to Time-Anchor Drift (TAD)—a phenomenon where an incorrect temporal reference is established early and subsequently propagated, ultimately causing reasoning failure. To address this issue, we propose M-TRACE, a multi-agent reasoning framework for temporal knowledge conflicts. M-TRACE explicitly maintains a State Timeline to perform step-wise temporal alignment and coexistence checks between internal states and external evidence. Detected conflicts are summarized into a structured Conflict Report, which guides conflict-aware final reasoning. We further introduce TimeConfQA, a temporal question answering benchmark with controlled temporal knowledge conflicts. Experimental results show that M-TRACE effectively reduces time-anchor drift and consistently improves performance on complex temporal question answering tasks, demonstrating the value of explicit conflict modeling for temporal reasoning. The code can be found at https://github.com/h-yii/M-TRACE.

pdf bib abs

Large language models (LLMs) are playing an increasingly pivotal role in LegalAI. However, existing benchmarks are primarily tailored for legal professionals, emphasizing deep reasoning and explainability. While public-facing legal applications demand outputs that are direct, actionable, and accessible, a need largely overlooked by current evaluation frameworks. To bridge this gap, we propose a public-oriented LegalAI benchmark grounded in legal functionalism and genre analysis. Specifically, we categorize public legal demands into two core tasks: Instant Question Answering and Legal Text Generation. We further introduce three public-oriented evaluation dimensions: legal normativity, content relevance, and format usability, which collectively assess the practical validity and user readiness of model outputs. To reflect real-world lay user usage, we evaluate 17 LLMs on Pub-LawBench using only simple prompts and Chain-of-Thought under a vanilla inference setting, excluding complex techniques like RAG or agent-based methods inaccessible to non-experts. Experiments reveal limitations of current LLMs in delivering effective public-oriented legal assistance, highlighting the need for more user-centric model development and benchmarking. Our code and datasets are available for review at https://anonymous.4open.science/r/P-LawBench-E565/.

pdf bib abs

A defence opinion is an essential step in criminal proceedings, yet it has not been systematically formulated or evaluated as a specific LegalAI task. Grounded in legal principles and practice, we formulate this task as generating a structured defence opinion conditioned jointly on an indictment and the defendant’s stated opinion, which often present conflicting claims. We formalize this setting as a dual-perspective generation problem and introduce DefGen-Bench, a benchmark comprising several Chinese criminal cases with expert-reviewed reference defence opinions. We evaluate eight large language models (LLMs) on this task and observe that existing models tend to mirror the defendant’s opinion, thereby overlooking more appropriate defence strategies. To address this challenge, we propose Knowledge-Enhanced Highlighted Indictment (KHI), a legal knowledge–guided input enhancement method applicable to both open- and closed-source LLMs. Experiments demonstrate consistent improvements across all evaluated LLMs, validating the effectiveness of the proposed approach.

Co-authors

Venues

ACL2
Findings2

Fix author