Myeong-Cheol Kang
2026
Can We Entrust Justice to AI?: How Persona Traps Contaminate Reasoning in Criminal Investigation
Jaewook Lee | Myeong-Cheol Kang | Jong-hun Shin
Findings of the Association for Computational Linguistics: ACL 2026
Jaewook Lee | Myeong-Cheol Kang | Jong-hun Shin
Findings of the Association for Computational Linguistics: ACL 2026
If large language models (LLMs) are deployed to analyze evidence and evaluate suspects in criminal investigations, are they free from the very trap that has led countless human investigators to misjudgment—implicit bias swayed by information irrelevant to the essence of the case? To answer this question, this study systematically injected personas (gender, race, relationship) into neutralized murder mystery scenarios and examined the reasoning stability of LLMs. Experimental results revealed that implicit bias propagation was observed across all models. The phenomenon where models outwardly state “that information is irrelevant to the judgment” while their actual conclusions are already influenced by the injected persona was universally observed. Interestingly, model scale alone did not guarantee stability: while the largest model achieved the lowest instability, several smaller models outperformed much larger ones. The most notable finding concerns the differential vulnerability across persona types: while race and gender were processed relatively stably, relationship information—particularly hostile relationships—induced significantly higher reasoning contamination. More concerning is the fact that even when conclusions were correctly maintained, the reasoning process itself was extensively contaminated. These findings suggest that current alignment techniques have created a blind spot by focusing on identity-based bias while neglecting relationship-based bias, and propose that stability evaluation should encompass not only outputs but also reasoning processes.
Make LLMs See Like Investigators, Not Just Think More: The Role of Structured Analysis in Investigative Reasoning
Jaewook Lee | Myeong-Cheol Kang | Jong-hun Shin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jaewook Lee | Myeong-Cheol Kang | Jong-hun Shin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Criminal investigators and intelligence analysts have developed structured analytic techniques to evaluate competing hypotheses under incomplete information. This study examines whether such human expert investigative methodologies are also effective for narrative-based culprit inference in large language models (LLMs). Focusing on the task of analyzing evidence from complex narratives and identifying the perpetrator among suspects, we conducted experiments on 10 LLMs using the MuSR murder mystery benchmark. The PRISM framework, which applies investigative techniques, consistently outperformed existing general-purpose strategies across all models, with its effectiveness manifesting regardless of model scale. Ablation studies revealed that the hypothesis structuring stage is particularly crucial, accounting for 89% of the methodological improvement beyond information filtering. This suggests that domain-specific structures that specify “what to analyze” are more effective in LLM reasoning than simply increasing the number of reasoning paths.