Benchmarking Item Difficulty Classification in German Vocational Education and Training

Alonso Palomino; Benjamin Paassen

Benchmarking Item Difficulty Classification in German Vocational Education and Training

Abstract

Predicting the difficulty of exam questions or items is essential to effectively assembling and calibrating exams. While item response theory (IRT) models can estimate item difficulty, they require student responses that are costly and rarely available at scale. Natural language processing methods offer a text-only alternative; however, due to the scarcity of real-world labeled data, prior work often relies on synthetic or domain-specific corpora, limiting generalizability and overlooking the nuanced challenges of real-world text-based item difficulty estimation. Addressing this gap, we benchmark 122 classifiers on 935 German Vocational Education and Training (VET) items labeled via previous IRT analysis to assess feasibility under real-world conditions. In our setup, a stacked ensemble that combines linguistic features, pre-trained embeddings, and external semantic resources outperforms both transformer-based models and few-shot large language models, achieving moderate performance. We report findings and discuss limitations in the context of German VET.

Anthology ID:: 2025.ranlp-1.99
Volume:: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 870–875
Language:
URL:: https://aclanthology.org/2025.ranlp-1.99/
DOI:
Bibkey:
Cite (ACL):: Alonso Palomino and Benjamin Paassen. 2025. Benchmarking Item Difficulty Classification in German Vocational Education and Training. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 870–875, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Benchmarking Item Difficulty Classification in German Vocational Education and Training (Palomino & Paassen, RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-1.99.pdf

PDF Cite Search Fix data