GUIR at SemEval-2026 Task 7: Probing Cultural Knowledge in LLMs via Multi-Agent Debate

Reihaneh Iranmanesh; Ophir Frieder; Nazli Goharian

GUIR at SemEval-2026 Task 7: Probing Cultural Knowledge in LLMs via Multi-Agent Debate

Reihaneh Iranmanesh, Ophir Frieder, Nazli Goharian

Abstract

We present the GUIR system for SemEval-2026 Task 7, Everyday Knowledge Across Diverse Languages and Cultures, which probes the extent to which general-purpose LLMs encode cultural knowledge without any culture-specific supervision or fine-tuning. Our system addresses two tracks built on the BLEnD benchmark. For the short-answer question (SAQ) track, we employ zero-shot prompting with gpt-4.1, achieving 55.5% accuracy across 61 language locales. For the multiple-choice question (MCQ) track, we propose a three-stage pipeline: (1) zero-shot chain-of-thought inference with gpt-5-mini, (2) cross-locale majority voting to correct inconsistent predictions, and (3) a multi-agent debate protocol in which three LLM instances argue and adjudicate over residual errors. This pipeline achieves 97.47% overall accuracy across 30 locales, ranking first among all submitted systems on the MCQ track. We further conduct a targeted human evaluation on the Persian locale, revealing that BLEnD’s lemma-matching scorer systematically underestimates model performance, with human annotators scoring the system 18 percentage points higher than the lemma-matching evaluation. This reveals the need for better evaluation of morphologically rich languages like Persian.

Anthology ID:: 2026.semeval-1.438
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3549–3561
Language:
URL:: https://aclanthology.org/2026.semeval-1.438/
DOI:
Bibkey:
Cite (ACL):: Reihaneh Iranmanesh, Ophir Frieder, and Nazli Goharian. 2026. GUIR at SemEval-2026 Task 7: Probing Cultural Knowledge in LLMs via Multi-Agent Debate. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3549–3561, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: GUIR at SemEval-2026 Task 7: Probing Cultural Knowledge in LLMs via Multi-Agent Debate (Iranmanesh et al., SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.438.pdf
Supplementarymaterial:: 2026.semeval-1.438.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data