LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers

Christophe Friezas Gonçalves, Salima Lamsiyah, Christoph Schommer


Abstract
Reading comprehension resources for low-resource languages remain limited, particularly datasets designed for educational assessment and diagnostic analysis in contrast to binary correctness.We present a diagnostically rich reading comprehension corpus forLuxembourgish, annotated using a two-layer framework that separateslinguistic sources of textual difficulty from cognitive and diagnosticproperties of comprehension questions. The linguistic layer captures span-level lexical, syntactic, morphological, and discourse-related features, while the cognitive layerannotates multiple-choice questions according to the PIRLS cognitiveprocesses and diagnostically meaningful distractor types following theSTARC framework.This design enables fine-grained analysis of reading comprehensionerrors by linking response patterns to underlying linguistic phenomena. The resulting corpus consists of 640 multiple-choice questions based on 16 annotated Luxembourgish texts. We describe the annotation methodology agreement measures, and will releasethe dataset as a publicly available resource for educational andlow-resource NLP research.
Anthology ID:
2026.loreslm-1.46
Volume:
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:
LoResLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
532–541
Language:
URL:
https://aclanthology.org/2026.loreslm-1.46/
DOI:
Bibkey:
Cite (ACL):
Christophe Friezas Gonçalves, Salima Lamsiyah, and Christoph Schommer. 2026. LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 532–541, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers (Gonçalves et al., LoResLM 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loreslm-1.46.pdf