Habib Yağız Demir


2026

Temporal information is an essential part of communication, and understanding language requires processing it effectively. Despite recent advances, Large Language Models (LLMs) still struggle with temporal understanding.Existing benchmarks primarily focus on English and underexplore how linguistic structure contributes to temporal meaning.As a result, temporal understanding in languages other than English remains largely understudied.In this paper, we introduce TimeRes, a Turkish benchmark for evaluating temporal understanding of LLMs. TimeRes aims to investigate comprehension of Reichenbach’s temporal points and reported speech through date arithmetic.Our dataset includes 4,600 questions across 4 tasks at two levels of complexity, and presents a paired question formulation to distinguish temporal discourse understanding from temporal arithmetic capabilities.We evaluated six LLMs, and demonstrated that models struggle to resolve reported speech and fail to generalize across word order variations.