LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers

Christophe Friezas Gonçalves; Salima Lamsiyah; Christoph Schommer

LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers

Christophe Friezas Gonçalves, Salima Lamsiyah, Christoph Schommer

Abstract

Reading comprehension resources for low-resource languages remain limited, particularly datasets designed for educational assessment and diagnostic analysis in contrast to binary correctness.We present a diagnostically rich reading comprehension corpus forLuxembourgish, annotated using a two-layer framework that separateslinguistic sources of textual difficulty from cognitive and diagnosticproperties of comprehension questions. The linguistic layer captures span-level lexical, syntactic, morphological, and discourse-related features, while the cognitive layerannotates multiple-choice questions according to the PIRLS cognitiveprocesses and diagnostically meaningful distractor types following theSTARC framework.This design enables fine-grained analysis of reading comprehensionerrors by linking response patterns to underlying linguistic phenomena. The resulting corpus consists of 640 multiple-choice questions based on 16 annotated Luxembourgish texts. We describe the annotation methodology agreement measures, and will releasethe dataset as a publicly available resource for educational andlow-resource NLP research.

Anthology ID:: 2026.loreslm-1.46
Volume:: Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:: LoResLM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 532–541
Language:
URL:: https://aclanthology.org/2026.loreslm-1.46/
DOI:
Bibkey:
Cite (ACL):: Christophe Friezas Gonçalves, Salima Lamsiyah, and Christoph Schommer. 2026. LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 532–541, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers (Gonçalves et al., LoResLM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.loreslm-1.46.pdf

PDF Cite Search Fix data