How to Make the Most of LLMs’ Grammatical Knowledge for Acceptability Judgments

Yusuke Ide; Yuto Nishida; Justin Vasselli; Miyu Oba; Yusuke Sakai; Hidetaka Kamigaito; Taro Watanabe

doi:10.18653/v1/2025.naacl-long.380

How to Make the Most of LLMs’ Grammatical Knowledge for Acceptability Judgments

Yusuke Ide, Yuto Nishida, Justin Vasselli, Miyu Oba, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

Abstract

The grammatical knowledge of language models (LMs) is often measured using a benchmark of linguistic minimal pairs, where LMs are presented with a pair of acceptable and unacceptable sentences and required to judge which is more acceptable. Conventional approaches compare sentence probabilities directly, but large language models (LLMs) provide nuanced evaluation methods using prompts and templates. We therefore investigate how to derive the most accurate acceptability judgments from LLMs to comprehensively evaluate their grammatical knowledge. Through extensive experiments in both English and Chinese, we compare nine judgment methods and demonstrate that two of them, in-template LP (a probability readout method) and Yes/No probability computing (a prompting-based method), achieve higher accuracy than the conventional approach. Our analysis reveals that the top two methods excel in different linguistic phenomena, suggesting they access different aspects of the LLMs’ grammatical knowledge. We find that ensembling the two methods achieves even higher accuracy. Consequently, we recommend these techniques, either individually or ensembled, as more effective alternatives to conventional approaches for assessing grammatical knowledge in LLMs.

Anthology ID:: 2025.naacl-long.380
Original:: 2025.naacl-long.380v1
Version 2:: 2025.naacl-long.380v2
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7416–7432
Language:
URL:: https://aclanthology.org/2025.naacl-long.380/
DOI:: 10.18653/v1/2025.naacl-long.380
Bibkey:
Cite (ACL):: Yusuke Ide, Yuto Nishida, Justin Vasselli, Miyu Oba, Yusuke Sakai, Hidetaka Kamigaito, and Taro Watanabe. 2025. How to Make the Most of LLMs’ Grammatical Knowledge for Acceptability Judgments. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7416–7432, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: How to Make the Most of LLMs’ Grammatical Knowledge for Acceptability Judgments (Ide et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.380.pdf

PDF (v2) PDF (v1) Cite Search Fix data