Dawei Mo
2025
Unveiling the Linguistic Acceptability Judgments of Large Language Models in Multilingual Contexts
Fuyu Xing | Haoyu Huang | Dawei Mo | Xinzhuo Yang | Zixuan Gao | Wei Wang | Zimu Wang | Haiyang Zhang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Fuyu Xing | Haoyu Huang | Dawei Mo | Xinzhuo Yang | Zixuan Gao | Wei Wang | Zimu Wang | Haiyang Zhang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"Linguistic acceptability judgments are essential for evaluating how language models internalize human-like grammatical knowledge. Though some studies have evaluated large language mod-els (LLMs) in this context, existing research lacks systematic exploration of diverse learning paradigms in a multilingual setting. In this paper, we present the first multilingual evaluation of LLMs across four languages (English, Chinese, Japanese, and Russian) in the field of linguistic acceptability. Our evaluation spans both general-purpose (i.e., GPT-4o, GPT-4o mini,DeepSeek-V3, GLM-4-32B, and the Qwen series) and reasoning-oriented (QwQ-32B-Preview and DeepSeek-R1-32B) models under zero-shot and monolingual, cross-lingual and multilingual fine-tuning settings, with comparisons to pre-trained language model (PLM) baselines. Our analysis highlights the strong generalizability of large-scale LLMs through zero-shot prompting, the challenges of fine-tuning small-sized LLMs with skewed training data, the effectiveness of multilingual fine-tuning for low-resource languages, the scaling law exhibited on the task, and the limitation of reasoning-oriented models on the task, even when “aha moments” occur during the reasoning process."