NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection

Miriam Calderon-Reyes; Fernando Sanchez-Vega; Adrian Pastor Lopez Monroy

NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection

Miriam Calderon-Reyes, Fernando Sanchez-Vega, Adrian Pastor Lopez Monroy

Abstract

This paper describes our participation in SemEval 2026 Task 9: Multilingual Text Polarization. The task requires estimating polarization levels across languages, where linguistic variability and limited annotated data pose significant challenges. To address data scarcity, we propose a pipeline that combines cross-lingual translation, synthetic data augmentation via LLMs, and domain-specific pre-trained models. Our approach leverages the hypothesis that polarization signals can transfer across languages without substantial loss of semantic alignment, enabling effective data augmentation through translation. Notably, one-shot synthetic example generation emerges as a viable strategy for enriching training data in topic-specific scenarios. Experimental results demonstrate high stability and competitive performance, achieving a macro F1-score of 0.7869 for Spanish and 0.7939 for English on the test set, ranking 21th on the official English leaderboard, while our Spanish results are competitive with top-performing systems, corresponding to 7th place.

Anthology ID:: 2026.semeval-1.362
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2886–2893
Language:
URL:: https://aclanthology.org/2026.semeval-1.362/
DOI:
Bibkey:
Cite (ACL):: Miriam Calderon-Reyes, Fernando Sanchez-Vega, and Adrian Pastor Lopez Monroy. 2026. NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2886–2893, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection (Calderon-Reyes et al., SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.362.pdf
Supplementarymaterial:: 2026.semeval-1.362.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data