Tazin Afrin

2025

pdf bib abs
Toward Automated Evaluation of AI-Generated Item Drafts in Clinical Assessment
Tazin Afrin | Le An Ha | Victoria Yaneva | Keelan Evanini | Steven Go | Kristine DeRuchie | Michael Heilig
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers

This study examines the classification of AI-generated clinical multiple-choice questions drafts as “helpful” or “non-helpful” starting points. Expert judgments were analyzed, and multiple classifiers were evaluated—including feature-based models, fine-tuned transformers, and few-shot prompting with GPT-4. Our findings highlight the challenges and considerations for evaluation methods of AI-generated items in clinical test development.

2023

pdf bib abs
Predicting Desirable Revisions of Evidence and Reasoning in Argumentative Writing
Tazin Afrin | Diane Litman
Findings of the Association for Computational Linguistics: EACL 2023

We develop models to classify desirable evidence and desirable reasoning revisions in student argumentative writing. We explore two ways to improve classifier performance – using the essay context of the revision, and using the feedback students received before the revision. We perform both intrinsic and extrinsic evaluation for each of our models and report a qualitative analysis. Our results show that while a model using feedback information improves over a baseline model, models utilizing context - either alone or with feedback - are the most successful in identifying desirable revisions.

2020

pdf bib abs
Annotation and Classification of Evidence and Reasoning Revisions in Argumentative Writing
Tazin Afrin | Elaine Lin Wang | Diane Litman | Lindsay Clare Matsumura | Richard Correnti
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Automated writing evaluation systems can improve students’ writing insofar as students attend to the feedback provided and revise their essay drafts in ways aligned with such feedback. Existing research on revision of argumentative writing in such systems, however, has focused on the types of revisions students make (e.g., surface vs. content) rather than the extent to which revisions actually respond to the feedback provided and improve the essay. We introduce an annotation scheme to capture the nature of sentence-level revisions of evidence use and reasoning (the ‘RER’ scheme) and apply it to 5th- and 6th-grade students’ argumentative essays. We show that reliable manual annotation can be achieved and that revision annotations correlate with a holistic assessment of essay improvement in line with the feedback provided. Furthermore, we explore the feasibility of automatically classifying revisions according to our scheme.

2018

pdf bib abs
Annotation and Classification of Sentence-level Revision Improvement
Tazin Afrin | Diane Litman
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

Studies of writing revisions rarely focus on revision quality. To address this issue, we introduce a corpus of between-draft revisions of student argumentative essays, annotated as to whether each revision improves essay quality. We demonstrate a potential usage of our annotations by developing a machine learning model to predict revision improvement. With the goal of expanding training data, we also extract revisions from a dataset edited by expert proofreaders. Our results indicate that blending expert and non-expert revisions increases model performance, with expert data particularly important for predicting low-quality revisions.

Co-authors

Le An Ha 1

Michael Heilig 1

Lindsay Clare Matsumura 1

Elaine Lin Wang 1

Victoria Yaneva 1

Venues

Fix author