May Lynn Reese


2025

We evaluate linguistic proficiency of humans and LLMs on pronoun resolution in Japanese, using the Winograd Schema Challenge dataset. Humans outperform LLMs in the baseline condition, but we find evidence for task demand effectss in both humans and LLMs. We also found that LLMs surpass human performance in scenarios referencing US culture, providing strong evidence for content effects.