You-Kuan Lin
2025
LOBSTER: Linguistics Olympiad Benchmark for Structured Evaluation on Reasoning
Da-Chen Lian
|
Ri-Sheng Huang
|
Pin-Er Chen
|
Chunki Lim
|
You-Kuan Lin
|
Guan-Yu Tseng
|
Zhen-Yu Lin
|
Pin-Cheng Chen
|
Shu-Kai Hsieh
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
We propose the Linguistics Olympiad Benchmark for Structured Evaluation on Reasoning, or LOBSTER, a linguistically-informed benchmark designed to evaluate large language models (LLMs) on complex linguistic puzzles of the International Linguistics Olympiad (IOL). Unlike prior benchmarks that focus solely on final answer accuracy, our benchmark provides concrete evaluation protocols and rich typological metadata across over 90 low-resource and cross-cultural languages alongside the puzzles. Through systematic evaluations of state-of-the-art models on multilingual abilities, we demonstrate that LLMs struggle with low-resource languages, underscoring the need for such a benchmark. Experiments with various models on our benchmark showed that IOL problems remain a challenging task for reasoning models, though there are ways to enhance the performance—for example, iterative reasoning outperforms single-pass approaches in both final answers and explanations. Our benchmark offers a comprehensive foundation for advancing linguistically grounded, culturally informed, and cognitively plausible reasoning in LLMs.
Search
Fix author
Co-authors
- Pin-Er Chen 1
- Pin-Cheng Chen 1
- Shu-Kai Hsieh 1
- Ri-Sheng Huang 1
- Da-Chen Lian 1
- show all...