Multilingual Native Language Identification with Large Language Models

Dhiman Goswami; Marcos Zampieri; Kai North; Shervin Malmasi; Antonios Anastasopoulos

doi:10.18653/v1/2025.naacl-srw.19

Multilingual Native Language Identification with Large Language Models

Dhiman Goswami, Marcos Zampieri, Kai North, Shervin Malmasi, Antonios Anastasopoulos

Abstract

Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of individuals based on their second language (L2) production. The introduction of Large Language Models (LLMs) with billions of parameters has renewed interest in text-based NLI, with new studies exploring LLM-based approaches to NLI on English L2. The capabilities of state-of-the-art LLMs on non-English NLI corpora, however, have not yet been fully evaluated. To fill this important gap, we present the first evaluation of LLMs for multilingual NLI. We evaluated the performance of several LLMs compared to traditional statistical machine learning models and language-specific BERT-based models on NLI corpora in English, Italian, Norwegian, and Portuguese. Our results show that fine-tuned GPT-4 models achieve state-of-the-art NLI performance.

Anthology ID:: 2025.naacl-srw.19
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: April
Year:: 2025
Address:: Albuquerque, USA
Editors:: Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 193–199
Language:
URL:: https://aclanthology.org/2025.naacl-srw.19/
DOI:: 10.18653/v1/2025.naacl-srw.19
Bibkey:
Cite (ACL):: Dhiman Goswami, Marcos Zampieri, Kai North, Shervin Malmasi, and Antonios Anastasopoulos. 2025. Multilingual Native Language Identification with Large Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 193–199, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: Multilingual Native Language Identification with Large Language Models (Goswami et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-srw.19.pdf

PDF Cite Search Fix data