IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Xingwei Tan, Mahathi Parvatham, Chiara Gambi, Gabriele Pergola


Abstract
Engagement and motivation are crucial for second-language acquisition, yet maintaining learner interest in educational conversations remains a challenge. While prior research has explored what makes educational texts interesting, still little is known about the linguistic features that drive engagement in conversations. To address this gap, we introduce IntrEx, the first large dataset annotated for interestingness and expected interestingness in teacher-student interactions. Built upon the Teacher-Student Chatroom Corpus (TSCC), IntrEx extends prior work by incorporating sequence-level annotations, allowing for the study of engagement beyond isolated turns to capture how interest evolves over extended dialogues. We employ a rigorous annotation process with over 100 second-language learners, using a comparison-based rating approach inspired by reinforcement learning from human feedback (RLHF) to improve agreement. We investigate whether large language models (LLMs) can predict human interestingness judgments. We find that LLMs (7B/8B parameters) fine-tuned on interestingness ratings outperform larger proprietary models like GPT-4o, demonstrating the potential for specialised datasets to model engagement in educational settings. Finally, we analyze how linguistic and cognitive factors, such as concreteness, comprehensibility (readability), and uptake, influence engagement in educational dialogues.
Anthology ID:
2025.findings-emnlp.1191
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21830–21845
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1191/
DOI:
Bibkey:
Cite (ACL):
Xingwei Tan, Mahathi Parvatham, Chiara Gambi, and Gabriele Pergola. 2025. IntrEx: A Dataset for Modeling Engagement in Educational Conversations. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 21830–21845, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
IntrEx: A Dataset for Modeling Engagement in Educational Conversations (Tan et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1191.pdf
Checklist:
 2025.findings-emnlp.1191.checklist.pdf