Could language models win the International Linguistics Olympiad?

Jamie Garnham; Ehsan Shareghi

Could language models win the International Linguistics Olympiad?

Abstract

Linguistic puzzles, wherein the solver must deduce rules of an unfamiliar language purely in-context, represent a uniquely perplexing problem format even for state-of-the-art large language models. Yet by exploring various inference-time scaling methods, we demonstrate that language models’ performance on these problems can be improved without the need for fine-tuning or providing supplementary linguistic context. To this end, this paper introduces the first domain-specific inference-time scaling framework for linguistic puzzles, which we use to improve the performance of three model families - R1 (Deepseek), Gemini 2.5 Flash (Google), and Llama 3.3 70B Instruct (Meta) - on a challenging Linguistics Olympiad-based benchmark by 4.9, 13.1, and 4.9 percentage points, respectively. Nonetheless, even when multiple optimisations are applied, we find that LLMs’ linguistic puzzle performance remains well below comparable mathematical and commonsense benchmarks, and we speculate as to why linguistic reasoning continues to pose a distinctive challenge for even the most capable large language models.

Anthology ID:: 2026.conll-main.28
Volume:: Proceedings of the 30th Conference on Computational Natural Language Learning
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Claire Bonial, Yevgeni Berzak
Venues:: CoNLL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 481–500
Language:
URL:: https://aclanthology.org/2026.conll-main.28/
DOI:
Bibkey:
Cite (ACL):: Jamie Garnham and Ehsan Shareghi. 2026. Could language models win the International Linguistics Olympiad?. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 481–500, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Could language models win the International Linguistics Olympiad? (Garnham & Shareghi, CoNLL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.conll-main.28.pdf

PDF Cite Search Fix data