SMASH at SemEval-2026 Task 9: Detecting Multilingual Polarisation with Encoder Ensembles and Calibrated Decision Thresholds

Zahra Bokaei; Alessandra Terranova; Yi Zheng; Tom Bidewell; Björn Ross

SMASH at SemEval-2026 Task 9: Detecting Multilingual Polarisation with Encoder Ensembles and Calibrated Decision Thresholds

Zahra Bokaei, Alessandra Terranova, Yi Zheng, Tom Bidewell, Bjorn Ross

Abstract

This paper describes the SMASH submission to SemEval-2026 Task~9 on multilingual, multicultural, and multi-event polarisation detection. The task comprises (i) binary polarisation detection, (ii) multi-label classification of polarisation types, and (iii) multi-label identification of polarisation manifestations across all available languages. We propose a language-adaptive ensemble framework combining monolingual and multilingual encoder-only transformers, together with a principled out-of-fold (OOF) threshold tuning strategy. Instead of relying on fixed probability thresholds, we jointly tune ensemble weights and class-wise decision thresholds to directly optimise macro-F1 under the official evaluation metric. Our experiments show that (1) monolingual encoders dominate in several high-resource languages but benefit from complementary multilingual signals, (2) no single multilingual backbone universally outperforms others across languages and subtasks, and (3) language-specific class threshold tuning substantially improves performance due to large cross-lingual variation in class distributions. Our results demonstrate that careful logit-level ensembling and threshold tuning provide strong performance for multilingual, imbalanced, multi-label polarisation detection. Across 22 evaluation languages, SMASH ranks among the top three systems in a substantial number of language–subtask pairs. Specifically, it ranks in the top three for 5 languages in Subtask 1, 14 languages in Subtask 2, and 16 languages in Subtask 3, demonstrating strong and consistent performance across diverse languages and tasks. Our system achieves average macro-F1 scores of 0.81, 0.62, and 0.53 for Subtasks 1, 2, and 3, respectively.

Anthology ID:: 2026.semeval-1.116
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 832–848
Language:
URL:: https://aclanthology.org/2026.semeval-1.116/
DOI:
Bibkey:
Cite (ACL):: Zahra Bokaei, Alessandra Terranova, Yi Zheng, Tom Bidewell, and Bjorn Ross. 2026. SMASH at SemEval-2026 Task 9: Detecting Multilingual Polarisation with Encoder Ensembles and Calibrated Decision Thresholds. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 832–848, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: SMASH at SemEval-2026 Task 9: Detecting Multilingual Polarisation with Encoder Ensembles and Calibrated Decision Thresholds (Bokaei et al., SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.116.pdf

PDF Cite Search Fix data