Reza Farahbakhsh

2025

pdf bib abs
Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning
Aditya Narayan Sankaran | Reza Farahbakhsh | Noel Crespi
Proceedings of the 31st International Conference on Computational Linguistics

Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.

2024

pdf bib abs
Improving Cross-lingual Transfer with Contrastive Negative Learning and Self-training
Guanlin Li | Xuechen Zhao | Amir Jafari | Wenhao Shao | Reza Farahbakhsh | Noel Crespi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recent studies improve the cross-lingual transfer learning by better aligning the internal representations within the multilingual model or exploring the information of the target language using self-training. However, the alignment-based methods exhibit intrinsic limitations such as non-transferable linguistic elements, while most of the self-training based methods ignore the useful information hidden in the low-confidence samples. To address this issue, we propose CoNLST (Contrastive Negative Learning and Self-Training) to leverage the information of low-confidence samples. Specifically, we extend the negative learning to the metric space by selecting negative pairs based on the complementary labels and then employ self-training to iteratively train the model to converge on the obtained clean pseudo-labels. We evaluate our approach on the widely-adopted cross-lingual benchmark XNLI. The experiment results show that our method improves upon the baseline models and can serve as a beneficial complement to the alignment-based methods.

2023

pdf bib abs
No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks
Sergey Berezin | Reza Farahbakhsh | Noel Crespi
Findings of the Association for Computational Linguistics: EMNLP 2023

We introduce a simple yet efficient sentence-level attack on black-box toxicity detector models. By adding several positive words or sentences to the end of a hateful message, we are able to change the prediction of a neural network and pass the toxicity detection system check. This approach is shown to be working on seven languages from three different language families. We also describe the defence mechanism against the aforementioned attack and discuss its limitations.

Co-authors

Wenhao Shao 1

Xuechen Zhao 1

Venues

Fix data