Bilal Khan


2025

pdf bib
A Three-Tier LLM Framework for Forecasting Student Engagement from Qualitative Longitudinal Data
Ahatsham Hayat | Helen Martinez | Bilal Khan | Mohammad Rashedul Hasan
Proceedings of the 29th Conference on Computational Natural Language Learning

Forecasting nuanced shifts in student engagement from longitudinal experiential (LE) data—multi-modal, qualitative trajectories of academic experiences over time—remains challenging due to high dimensionality and missingness. We propose a natural language processing (NLP)-driven framework using large language models (LLMs) to forecast binary engagement levels across four dimensions: Lecture Engagement Disposition, Academic Self-Efficacy, Performance Self-Evaluation, and Academic Identity and Value Perception. Evaluated on 960 trajectories from 96 first-year STEM students, our three-tier approach—LLM-informed imputation to generate textual descriptors for missing-not-at-random (MNAR) patterns, zero-shot feature selection via ensemble voting, and fine-tuned LLMs—processes textual non-cognitive responses. LLMs substantially outperform numeric baselines (e.g., Random Forest, LSTM) by capturing contextual nuances in student responses. Encoder-only LLMs surpass decoder-only variants, highlighting architectural strengths for sparse, qualitative LE data. Our framework advances NLP solutions for modeling student engagement from complex LE data, excelling where traditional methods struggle.

pdf bib
ConText-LE: Cross-Distribution Generalization for Longitudinal Experiential Data via Narrative-Based LLM Representations
Ahatsham Hayat | Bilal Khan | Mohammad Rashedul Hasan
Findings of the Association for Computational Linguistics: EMNLP 2025

Longitudinal experiential data offers rich insights into dynamic human states, yet building models that generalize across diverse contexts remains challenging. We propose ConText-LE, a framework that systematically investigates text representation strategies and output formulations to maximize large language model cross-distribution generalization for behavioral forecasting. Our novel Meta-Narrative representation synthesizes complex temporal patterns into semantically rich narratives, while Prospective Narrative Generation reframes prediction as a generative task aligned with LLMs’ contextual understanding capabilities. Through comprehensive experiments on three diverse longitudinal datasets addressing the underexplored challenge of cross-distribution generalization in mental health and educational forecasting, we show that combining Meta-Narrative input with Prospective Narrative Generation significantly outperforms existing approaches. Our method achieves up to 12.28% improvement in out-of-distribution accuracy and up to 11.99% improvement in F1 scores over binary classification methods. Bidirectional evaluation and architectural ablation studies confirm the robustness of our approach, establishing ConText-LE as an effective framework for reliable behavioral forecasting across temporal and contextual shifts.

2024

pdf bib
Improving Transfer Learning for Early Forecasting of Academic Performance by Contextualizing Language Models
Ahatsham Hayat | Bilal Khan | Mohammad Hasan
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

This paper presents a cutting-edge method that harnesses contextualized language models (LMs) to significantly enhance the prediction of early academic performance in STEM fields. Our approach uniquely tackles the challenge of transfer learning with limited-domain data. Specifically, we overcome this challenge by contextualizing students’ cognitive trajectory data through the integration of both distal background factors (comprising academic information, demographic details, and socioeconomic indicators) and proximal non-cognitive factors (such as emotional engagement). By tapping into the rich prior knowledge encoded within pre-trained LMs, we effectively reframe academic performance forecasting as a task ideally suited for natural language processing.Our research rigorously examines three key aspects: the impact of data contextualization on prediction improvement, the effectiveness of our approach compared to traditional numeric-based models, and the influence of LM capacity on prediction accuracy. The results underscore the significant advantages of utilizing larger LMs with contextualized inputs, representing a notable advancement in the precision of early performance forecasts. These findings emphasize the importance of employing contextualized LMs to enhance artificial intelligence-driven educational support systems and overcome data scarcity challenges.