Data Augmentation through Back-Translation for Stereotypes and Irony Detection

Tom Bourgeade; Silvia Casola; Adel Mahmoud Wizan; Cristina Bosco

Data Augmentation through Back-Translation for Stereotypes and Irony Detection

Tom Bourgeade, Silvia Casola, Adel Mahmoud Wizan, Cristina Bosco

Abstract

Complex linguistic phenomena such as stereotypes or irony are still challenging to detect, particularly due to the lower availability of annotated data. In this paper, we explore Back-Translation (BT) as a data augmentation method to enhance such datasets by artificially introducing semantics-preserving variations. We investigate French and Italian as source languages on two multilingual datasets annotated for the presence of stereotypes or irony and evaluate French/Italian, English, andArabic as pivot languages for the BT process. We also investigate cross-translation, i.e., augmenting one language subset of a multilingual dataset with translated instances from the other languages. We conduct an intrinsic evaluation of the quality of back-translated instances, identifying linguistic or translation model-specific errors that may occur with BT. We also perform an extrinsic evaluation of different data augmentation configurations to train a multilingual Transformer-based classifier forstereotype or irony detection on mono-lingual data.

Anthology ID:: 2024.clicit-1.12
Volume:: Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:: December
Year:: 2024
Address:: Pisa, Italy
Editors:: Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:: CLiC-it
SIG:
Publisher:: CEUR Workshop Proceedings
Note:
Pages:: 90–97
Language:
URL:: https://aclanthology.org/2024.clicit-1.12/
DOI:
Bibkey:
Cite (ACL):: Tom Bourgeade, Silvia Casola, Adel Mahmoud Wizan, and Cristina Bosco. 2024. Data Augmentation through Back-Translation for Stereotypes and Irony Detection. In Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024), pages 90–97, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):: Data Augmentation through Back-Translation for Stereotypes and Irony Detection (Bourgeade et al., CLiC-it 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.clicit-1.12.pdf

PDF Cite Search Fix data