Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models

Seonmin Koo, Jinsung Kim, Chanjun Park, Heuiseok Lim


Abstract
Grammatical error correction (GEC) system is a practical task used in the real world, showing high achievements alongside the development of large language models (LLMs). However, these achievements have been primarily obtained in English, and there is a relative lack of performance for non-English data, such as Korean. We hypothesize that this insufficiency occurs because relying solely on the parametric knowledge of LLMs makes it difficult to thoroughly understand the given context in the Korean GEC. Therefore, we propose a Knowledge-Augmented GEC (KAGEC) framework that incorporates evidential information from external sources into the prompt for the GEC task. KAGEC first extracts salient phrases from the given source and retrieves non-parametric knowledge based on these phrases, aiming to enhance the context-aware generation capabilities of LLMs. Furthermore, we conduct validations for fine-grained error types to identify those requiring a retrieval-augmented manner when LLMs perform Korean GEC. According to experimental results, most LLMs, including ChatGPT, demonstrate significant performance improvements when applying KAGEC.
Anthology ID:
2024.findings-emnlp.6
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
96–125
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.6
DOI:
10.18653/v1/2024.findings-emnlp.6
Bibkey:
Cite (ACL):
Seonmin Koo, Jinsung Kim, Chanjun Park, and Heuiseok Lim. 2024. Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 96–125, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models (Koo et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.6.pdf