A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data

Ahatsham Hayat; Helen Martinez; Bilal Khan; Mohammad Rashedul Hasan

doi:10.18653/v1/2025.conll-1.22

A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data

Ahatsham Hayat, Helen Martinez, Bilal Khan, Mohammad Rashedul Hasan

Abstract

Forecasting nuanced shifts in student engagement from longitudinal experiential (LE) data—multi-modal, qualitative trajectories of academic experiences over time—remains challenging due to high dimensionality and missingness. We propose a natural language processing (NLP)-driven framework using large language models (LLMs) to forecast binary engagement levels across four dimensions: Lecture Engagement Disposition, Academic Self-Efficacy, Performance Self-Evaluation, and Academic Identity and Value Perception. Evaluated on 960 trajectories from 96 first-year STEM students, our three-tier approach—LLM-informed imputation to generate textual descriptors for missing-not-at-random (MNAR) patterns, zero-shot feature selection via ensemble voting, and fine-tuned LLMs—processes textual non-cognitive responses. LLMs substantially outperform numeric baselines (e.g., Random Forest, LSTM) by capturing contextual nuances in student responses. Encoder-only LLMs surpass decoder-only variants, highlighting architectural strengths for sparse, qualitative LE data. Our framework advances NLP solutions for modeling student engagement from complex LE data, excelling where traditional methods struggle.

Anthology ID:: 2025.conll-1.22
Volume:: Proceedings of the 29th Conference on Computational Natural Language Learning
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Gemma Boleda, Michael Roth
Venues:: CoNLL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 334–347
Language:
URL:: https://aclanthology.org/2025.conll-1.22/
DOI:: 10.18653/v1/2025.conll-1.22
Bibkey:
Cite (ACL):: Ahatsham Hayat, Helen Martinez, Bilal Khan, and Mohammad Rashedul Hasan. 2025. A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data. In Proceedings of the 29th Conference on Computational Natural Language Learning, pages 334–347, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data (Hayat et al., CoNLL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.conll-1.22.pdf
Atachment:: 2025.conll-1.22.atachment.pdf

PDF Cite Search Atachment Fix data