Rehab Alsudais
2025
Does Machine Translation Impact Offensive Language Identification? The Case of Indo-Aryan Languages
Alphaeus Dmonte
|
Shrey Satapara
|
Rehab Alsudais
|
Tharindu Ranasinghe
|
Marcos Zampieri
Proceedings of the First Workshop on Language Models for Low-Resource Languages
The accessibility to social media platforms can be improved with the use of machine translation (MT). Non-standard features present in user-generated on social media content such as hashtags, emojis, and alternative spellings can lead to mistranslated instances by the MT systems. In this paper, we investigate the impact of MT on offensive language identification in Indo-Aryan languages. We use both original and MT datasets to evaluate the performance of various offensive language models. Our evaluation indicates that offensive language identification models achieve superior performance on original data than on MT data, and that the models trained on MT data identify offensive language more precisely on MT data than the models trained on original data.