Linguistic proficiency of humans and LLMs in Japanese: Effects of task demands and content

May Lynn Reese, Anastasia Smirnova


Abstract
We evaluate linguistic proficiency of humans and LLMs on pronoun resolution in Japanese, using the Winograd Schema Challenge dataset. Humans outperform LLMs in the baseline condition, but we find evidence for task demand effectss in both humans and LLMs. We also found that LLMs surpass human performance in scenarios referencing US culture, providing strong evidence for content effects.
Anthology ID:
2025.aimecon-main.22
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
201–211
Language:
URL:
https://aclanthology.org/2025.aimecon-main.22/
DOI:
Bibkey:
Cite (ACL):
May Lynn Reese and Anastasia Smirnova. 2025. Linguistic proficiency of humans and LLMs in Japanese: Effects of task demands and content. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 201–211, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
Linguistic proficiency of humans and LLMs in Japanese: Effects of task demands and content (Reese & Smirnova, AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-main.22.pdf