Automated evaluation of written discourse coherence using GPT-4

Ben Naismith; Phoebe Mulcaire; Jill Burstein

doi:10.18653/v1/2023.bea-1.32

Automated evaluation of written discourse coherence using GPT-4

Ben Naismith, Phoebe Mulcaire, Jill Burstein

Abstract

The popularization of large language models (LLMs) such as OpenAI’s GPT-3 and GPT-4 have led to numerous innovations in the field of AI in education. With respect to automated writing evaluation (AWE), LLMs have reduced challenges associated with assessing writing quality characteristics that are difficult to identify automatically, such as discourse coherence. In addition, LLMs can provide rationales for their evaluations (ratings) which increases score interpretability and transparency. This paper investigates one approach to producing ratings by training GPT-4 to assess discourse coherence in a manner consistent with expert human raters. The findings of the study suggest that GPT-4 has strong potential to produce discourse coherence ratings that are comparable to human ratings, accompanied by clear rationales. Furthermore, the GPT-4 ratings outperform traditional NLP coherence metrics with respect to agreement with human ratings. These results have implications for advancing AWE technology for learning and assessment.

Anthology ID:: 2023.bea-1.32
Volume:: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
Venue:: BEA
SIG:: SIGEDU
Publisher:: Association for Computational Linguistics
Note:
Pages:: 394–403
Language:
URL:: https://aclanthology.org/2023.bea-1.32
DOI:: 10.18653/v1/2023.bea-1.32
Bibkey:
Cite (ACL):: Ben Naismith, Phoebe Mulcaire, and Jill Burstein. 2023. Automated evaluation of written discourse coherence using GPT-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 394–403, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Automated evaluation of written discourse coherence using GPT-4 (Naismith et al., BEA 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.bea-1.32.pdf

PDF Cite Search