Swapna Somasundaran

2024

Assessing Online Writing Feedback Resources: Generative AI vs. Good Samaritans
Shabnam Behzad | Omid Kashefi | Swapna Somasundaran
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Providing constructive feedback on student essays is a critical factor in improving educational results; however, it presents notable difficulties and may demand substantial time investments, especially when aiming to deliver individualized and informative guidance. This study undertakes a comparative analysis of two readily available online resources for students seeking to hone their skills in essay writing for English proficiency tests: 1) essayforum.com, a widely used platform where students can submit their essays and receive feedback from volunteer educators at no cost, and 2) Large Language Models (LLMs) such as ChatGPT. By contrasting the feedback obtained from these two resources, we posit that they can mutually reinforce each other and are more helpful if employed in conjunction when seeking no-cost online assistance. The findings of this research shed light on the challenges of providing personalized feedback and highlight the potential of AI in advancing the field of automated essay evaluation.

pdf bib abs

LEAF: Language Learners’ English Essays and Feedback Corpus
Shabnam Behzad | Omid Kashefi | Swapna Somasundaran
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

This paper addresses the issue of automated feedback generation for English language learners by presenting a corpus of English essays and their corresponding feedback, called LEAF, collected from the “essayforum” website. The corpus comprises approximately 6K essay-feedback pairs, offering a diverse and valuable resource for developing personalized feedback generation systems that address the critical deficiencies within essays, spanning from rectifying grammatical errors to offering insights on argumentative aspects and organizational coherence. Using this corpus, we present and compare multiple feedback generation baselines. Our findings shed light on the challenges of providing personalized feedback and highlight the potential of the LEAF corpus in advancing automated essay evaluation.

pdf bib abs

Identifying Fairness Issues in Automatically Generated Testing Content
Kevin Stowe | Benny Longwill | Alyssa Francis | Tatsuya Aoyama | Debanjan Ghosh | Swapna Somasundaran
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Natural language generation tools are powerful and effective for generating content. However, language models are known to display bias and fairness issues, making them impractical to deploy for many use cases. We here focus on how fairness issues impact automatically generated test content, which can have stringent requirements to ensure the test measures only what it was intended to measure. Specifically, we review test content generated for a large-scale standardized English proficiency test with the goal of identifying content that only pertains to a certain subset of the test population as well as content that has the potential to be upsetting or distracting to some test takers. Issues like these could inadvertently impact a test taker’s score and thus should be avoided. This kind of content does not reflect the more commonly-acknowledged biases, making it challenging even for modern models that contain safeguards. We build a dataset of 601 generated texts annotated for fairness and explore a variety of methods for classification: fine-tuning, topic-based classification, and prompting, including few-shot and self-correcting prompts. We find that combining prompt self-correction and few-shot learning performs best, yielding an F1 score of 0.79 on our held-out test set, while much smaller BERT- and topic-based models have competitive performance on out-of-domain data.

pdf bib abs

When Argumentation Meets Cohesion: Enhancing Automatic Feedback in Student Writing
Yuning Ding | Omid Kashefi | Swapna Somasundaran | Andrea Horbach
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we investigate the role of arguments in the automatic scoring of cohesion in argumentative essays. The feature analysis reveals that in argumentative essays, the lexical cohesion between claims is more important to the overall cohesion, while the evidence is expected to be diverse and divergent. Our results show that combining features related to argument segments and cohesion features improves the performance of the automatic cohesion scoring model trained on a transformer. The cohesion score is also learned more accurately in a multi-task learning process by adding the automatic segmentation of argumentative elements as an auxiliary task. Our findings contribute to both the understanding of cohesion in argumentative writing and the development of automatic feedback.

2023

pdf bib abs

Argument Detection in Student Essays under Resource Constraints
Omid Kashefi | Sophia Chan | Swapna Somasundaran
Proceedings of the 10th Workshop on Argument Mining

Learning to make effective arguments is vital for the development of critical-thinking in students and, hence, for their academic and career success. Detecting argument components is crucial for developing systems that assess students’ ability to develop arguments. Traditionally, supervised learning has been used for this task, but this requires a large corpus of reliable training examples which are often impractical to obtain for student writing. Large language models have also been shown to be effective few-shot learners, making them suitable for low-resource argument detection. However, concerns such as latency, service reliability, and data privacy might hinder their practical applicability. To address these challenges, we present a low-resource classification approach that combines the intrinsic entailment relationship among the argument elements with a parameter-efficient prompt-tuning strategy. Experimental results demonstrate the effectiveness of our method in reducing the data and computation requirements of training an argument detection model without compromising the prediction accuracy. This suggests the practical applicability of our model across a variety of real-world settings, facilitating broader access to argument classification for researchers spanning various domains and problem scenarios.

2022

pdf bib abs

AGReE: A system for generating Automated Grammar Reading Exercises
Sophia Chan | Swapna Somasundaran | Debanjan Ghosh | Mengxuan Zhao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We describe the AGReE system, which takes user-submitted passages as input and automatically generates grammar practice exercises that can be completed while reading. Multiple-choice practice items are generated for a variety of different grammar constructs: punctuation, articles, conjunctions, pronouns, prepositions, verbs, and nouns. We also conducted a large-scale human evaluation with around 4,500 multiple-choice practice items. We notice for 95% of items, a majority of raters out of five were able to identify the correct answer, for 85% of cases, raters agree that there is only one correct answer among the choices. Finally, the error analysis shows that raters made the most mistakes for punctuation and conjunctions.

2021

pdf bib abs

Training and Domain Adaptation for Supervised Text Segmentation
Goran Glavaš | Ananya Ganesh | Swapna Somasundaran
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

Unlike traditional unsupervised text segmentation methods, recent supervised segmentation models rely on Wikipedia as the source of large-scale segmentation supervision. These models have, however, predominantly been evaluated on the in-domain (Wikipedia-based) test sets, preventing conclusions about their general segmentation efficacy. In this work, we focus on the domain transfer performance of supervised neural text segmentation in the educational domain. To this end, we first introduce K12Seg, a new dataset for evaluation of supervised segmentation, created from educational reading material for grade-1 to college-level students. We then benchmark a hierarchical text segmentation model (HITS), based on RoBERTa, in both in-domain and domain-transfer segmentation experiments. While HITS produces state-of-the-art in-domain performance (on three Wikipedia-based test sets), we show that, subject to the standard full-blown fine-tuning, it is susceptible to domain overfitting. We identify adapter-based fine-tuning as a remedy that substantially improves transfer performance.