Using Generative AI to Develop a Common Metric in Item Response Theory

Peter Baldwin


Abstract
We propose a method for linking independently calibrated item response theory (IRT) scales using large language models to generate shared parameter estimates across forms. Applied to medical licensure data, the approach reliably recovers slope values across all conditions and yields accurate intercepts when cross-form differences in item difficulty are small.
Anthology ID:
2025.aimecon-main.30
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
281–289
Language:
URL:
https://aclanthology.org/2025.aimecon-main.30/
DOI:
Bibkey:
Cite (ACL):
Peter Baldwin. 2025. Using Generative AI to Develop a Common Metric in Item Response Theory. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 281–289, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
Using Generative AI to Develop a Common Metric in Item Response Theory (Baldwin, AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-main.30.pdf