Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

Joseph Marvin Imperial, Harish Tayyar Madabushi


Abstract
Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use. In this study, we select a diverse set of open and closed-source instruction-tuned language models and investigate their performances in writing story completions and simplifying narratives—tasks that teachers perform—using standard-guided prompts controlling text readability. Our extensive findings provide empirical proof of how globally recognized models like ChatGPT may be considered less effective and may require more refined prompts for these generative tasks compared to other open-sourced models such as BLOOMZ and FlanT5—which have shown promising results.
Anthology ID:
2023.gem-1.18
Volume:
Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Sebastian Gehrmann, Alex Wang, João Sedoc, Elizabeth Clark, Kaustubh Dhole, Khyathi Raghavi Chandu, Enrico Santus, Hooman Sedghamiz
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
205–223
Language:
URL:
https://aclanthology.org/2023.gem-1.18
DOI:
Bibkey:
Cite (ACL):
Joseph Marvin Imperial and Harish Tayyar Madabushi. 2023. Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models. In Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 205–223, Singapore. Association for Computational Linguistics.
Cite (Informal):
Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models (Imperial & Tayyar Madabushi, GEM-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.gem-1.18.pdf