SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts

Rose Wang, Pawan Wirawarn, Noah Goodman, Dorottya Demszky


Abstract
Lectures are a learning experience for both students and teachers. Students learn from teachers about the subject material, while teachers learn from students about how to refine their instruction. Unfortunately, online student feedback is unstructured and abundant, making it challenging for teachers to learn and improve. We take a step towards tackling this challenge. First, we contribute a dataset for studying this problem: SIGHT is a large dataset of 288 math lecture transcripts and 15,784 comments collected from the Massachusetts Institute of Technology OpenCourseWare (MIT OCW) YouTube channel. Second, we develop a rubric for categorizing feedback types using qualitative analysis. Qualitative analysis methods are powerful in uncovering domain-specific insights, however they are costly to apply to large data sources. To overcome this challenge, we propose a set of best practices for using large language models (LLMs) to cheaply classify the comments at scale. We observe a striking correlation between the model’s and humans’ annotation: Categories with consistent human annotations (0.9 inter-rater reliability, IRR) also display higher human-model agreement (0.7), while categories with less consistent human annotations (0.7-0.8 IRR) correspondingly demonstrate lower human-model agreement (0.3-0.5). These techniques uncover useful student feedback from thousands of comments, costing around $0.002 per comment. We conclude by discussing exciting future directions on using online student feedback and improving automated annotation techniques for qualitative research.
Anthology ID:
2023.bea-1.27
Volume:
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
315–351
Language:
URL:
https://aclanthology.org/2023.bea-1.27
DOI:
10.18653/v1/2023.bea-1.27
Bibkey:
Cite (ACL):
Rose Wang, Pawan Wirawarn, Noah Goodman, and Dorottya Demszky. 2023. SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 315–351, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts (Wang et al., BEA 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bea-1.27.pdf
Video:
 https://aclanthology.org/2023.bea-1.27.mp4