WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks

Manan Suri; Aaryak Garg; Divya Chaudhary; Ian Gorton; Bijendra Kumar

doi:10.18653/v1/2023.semeval-1.267

WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks

Manan Suri, Aaryak Garg, Divya Chaudhary, Ian Gorton, Bijendra Kumar

Abstract

Intimacy is an essential element of human relationships and language is a crucial means of conveying it. Textual intimacy analysis can reveal social norms in different contexts and serve as a benchmark for testing computational models’ ability to understand social information. In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER. WADER uses data augmentation to address the problems of data imbalance and data scarcity and provides a method for data augmentation in cross-lingual, zero-shot tasks. We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data and optimally select augmentation candidates. Our results show that WADER outperforms the baseline model and provides a direction for mitigating data imbalance and scarcity in text regression tasks.

Anthology ID:: 2023.semeval-1.267
Volume:: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1945–1952
Language:
URL:: https://aclanthology.org/2023.semeval-1.267/
DOI:: 10.18653/v1/2023.semeval-1.267
Bibkey:
Cite (ACL):: Manan Suri, Aaryak Garg, Divya Chaudhary, Ian Gorton, and Bijendra Kumar. 2023. WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1945–1952, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks (Suri et al., SemEval 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.semeval-1.267.pdf

PDF Cite Search Fix data