Sea_and_Wine at SemEval-2023 Task 9: A Regression Model with Data Augmentation for Multilingual Intimacy Analysis

Yuxi Chen; Yu Chang; Yanqing Tao; Yanru Zhang

doi:10.18653/v1/2023.semeval-1.9

Sea_and_Wine at SemEval-2023 Task 9: A Regression Model with Data Augmentation for Multilingual Intimacy Analysis

Yuxi Chen, Yu Chang, Yanqing Tao, Yanru Zhang

Abstract

In Task 9, we are required to analyze the textual intimacy of tweets in 10 languages. We fine-tune XLM-RoBERTa (XLM-R) pre-trained model to adapt to this multilingual regression task. After tentative experiments, severe class imbalance is observed in the official released dataset, which may compromise the convergence and weaken the model effect. To tackle such challenge, we take measures in two aspects. On the one hand, we implement data augmentation through machine translation to enlarge the scale of classes with fewer samples. On the other hand, we introduce focal mean square error (MSE) loss to emphasize the contributions of hard samples to total loss, thus further mitigating the impact of class imbalance on model effect. Extensive experiments demonstrate remarkable effectiveness of our strategies, and our model achieves high performance on the Pearson’s correlation coefficient (CC) almost above 0.85 on validation dataset.

Anthology ID:: 2023.semeval-1.9
Volume:: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 77–82
Language:
URL:: https://aclanthology.org/2023.semeval-1.9/
DOI:: 10.18653/v1/2023.semeval-1.9
Bibkey:
Cite (ACL):: Yuxi Chen, Yu Chang, Yanqing Tao, and Yanru Zhang. 2023. Sea_and_Wine at SemEval-2023 Task 9: A Regression Model with Data Augmentation for Multilingual Intimacy Analysis. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 77–82, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Sea_and_Wine at SemEval-2023 Task 9: A Regression Model with Data Augmentation for Multilingual Intimacy Analysis (Chen et al., SemEval 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.semeval-1.9.pdf

PDF Cite Search Fix data