Habib Yağız Demir

2026

TimeRes: A Turkish Benchmark For Evaluating Temporal Understanding of Large Language Models
Habib Yağız Demir | Ümit Atlamaz | Susan Üsküdarlı
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Temporal information is an essential part of communication, and understanding language requires processing it effectively. Despite recent advances, Large Language Models (LLMs) still struggle with temporal understanding.Existing benchmarks primarily focus on English and underexplore how linguistic structure contributes to temporal meaning.As a result, temporal understanding in languages other than English remains largely understudied.In this paper, we introduce TimeRes, a Turkish benchmark for evaluating temporal understanding of LLMs. TimeRes aims to investigate comprehension of Reichenbach’s temporal points and reported speech through date arithmetic.Our dataset includes 4,600 questions across 4 tasks at two levels of complexity, and presents a paired question formulation to distinguish temporal discourse understanding from temporal arithmetic capabilities.We evaluated six LLMs, and demonstrated that models struggle to resolve reported speech and fail to generalize across word order variations.

Co-authors

Venues

EACL1

Fix author