Pinetree at SemEval-2026 Task 7: A Large-Scale Failure Analysis of Cultural Grounding in Language Models

Yen Yee Yam; Hong Meng Yam

Pinetree at SemEval-2026 Task 7: A Large-Scale Failure Analysis of Cultural Grounding in Language Models

Abstract

Using a simple prompting strategy without fine-tuning or retrieval augmentation, our system achieved 88.85% micro-average and 90.55% macro-average accuracy, ranking #4 overall on SemEval-2026 Task 7. Our primary contribution is a failure analysis of 5,241 incorrect predictions (11.15% of the dataset), categorized using the six-topic BLEnD taxonomy. Errors concentrate in Food (39.42%) and Holidays/Celebration/Leisure (15.76%), but within-topic error rates are highest on Family (21.04%) and Work life (20.45%), which topics with limited representational density. Global-brand attractor errors account for only 2.50% of failures and are tightly localized: 98.5% fall on a single template (most popular sport team) in four low-resource cultures. Outside these templates, brand-default effects are statistically negligible. These findings support representational sparsity and knowledge-density asymmetry, not ideological skew, as the dominant cause of cultural misalignment in everyday behavioral tasks.

Anthology ID:: 2026.semeval-1.422
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3399–3407
Language:
URL:: https://aclanthology.org/2026.semeval-1.422/
DOI:
Bibkey:
Cite (ACL):: Yen Yee Yam and Hong Meng Yam. 2026. Pinetree at SemEval-2026 Task 7: A Large-Scale Failure Analysis of Cultural Grounding in Language Models. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3399–3407, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Pinetree at SemEval-2026 Task 7: A Large-Scale Failure Analysis of Cultural Grounding in Language Models (Yam & Yam, SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.422.pdf
Supplementarymaterial:: 2026.semeval-1.422.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data