Adel Mahmoud Wizan


2024

pdf bib
Data Augmentation through Back-Translation for Stereotypes and Irony Detection
Tom Bourgeade | Silvia Casola | Adel Mahmoud Wizan | Cristina Bosco
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Complex linguistic phenomena such as stereotypes or irony are still challenging to detect, particularly due to the lower availability of annotated data. In this paper, we explore Back-Translation (BT) as a data augmentation method to enhance such datasets by artificially introducing semantics-preserving variations. We investigate French and Italian as source languages on two multilingual datasets annotated for the presence of stereotypes or irony and evaluate French/Italian, English, andArabic as pivot languages for the BT process. We also investigate cross-translation, i.e., augmenting one language subset of a multilingual dataset with translated instances from the other languages. We conduct an intrinsic evaluation of the quality of back-translated instances, identifying linguistic or translation model-specific errors that may occur with BT. We also perform an extrinsic evaluation of different data augmentation configurations to train a multilingual Transformer-based classifier forstereotype or irony detection on mono-lingual data.