IndoCL: Benchmarking Indonesian Language Development Assessment

Nankai Lin (林楠铠); Hongyan Wu; Weixiong Zheng; Xingming Liao; Shengyi Jiang; Aimin Yang (阳爱民); Lixian Xiao

doi:10.18653/v1/2024.findings-emnlp.280

IndoCL: Benchmarking Indonesian Language Development Assessment

Nankai Lin, Hongyan Wu, Weixiong Zheng, Xingming Liao, Shengyi Jiang, Aimin Yang, Lixian Xiao

Abstract

Recently, the field of language acquisition (LA) has significantly benefited from natural language processing technologies. A crucial task in LA involves tracking the evolution of language learners’ competence, namely language development assessment (LDA). However, the majority of LDA research focuses on high-resource languages, with limited attention directed toward low-resource languages. Moreover, existing methodologies primarily depend on linguistic rules and language characteristics, with a limited exploration of exploiting pre-trained language models (PLMs) for LDA. In this paper, we construct the IndoCL corpus (Indonesian Corpus of L2 Learners), which comprises compositions written by undergraduate students majoring in Indonesian language. Moreover, we propose a model for LDA tasks, which automatically extracts language-independent features, relieving laborious computation and reliance on specific language. The proposed model uses sequential information attention and similarity representation learning to capture the differences and common information from the first-written and second-written essays, respectively. It has demonstrated remarkable performance on both our self-constructed corpus and publicly available corpora. Our work could serve as a novel benchmark for Indonesian LDA tasks. We also explore the feasibility of using existing large-scale language models (LLMs) for LDA tasks. The results show significant potential for improving LLM performance in LDA tasks.

Anthology ID:: 2024.findings-emnlp.280
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4873–4885
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.280/
DOI:: 10.18653/v1/2024.findings-emnlp.280
Bibkey:
Cite (ACL):: Nankai Lin, Hongyan Wu, Weixiong Zheng, Xingming Liao, Shengyi Jiang, Aimin Yang, and Lixian Xiao. 2024. IndoCL: Benchmarking Indonesian Language Development Assessment. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4873–4885, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: IndoCL: Benchmarking Indonesian Language Development Assessment (Lin et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.280.pdf

PDF Cite Search Fix data