%0 Conference Proceedings %T Offensive Content Detection via Synthetic Code-Switched Text %A Salaam, Cesa %A Dernoncourt, Franck %A Bui, Trung %A Rawat, Danda %A Yoon, Seunghyun %Y Calzolari, Nicoletta %Y Huang, Chu-Ren %Y Kim, Hansaem %Y Pustejovsky, James %Y Wanner, Leo %Y Choi, Key-Sun %Y Ryu, Pum-Mo %Y Chen, Hsin-Hsi %Y Donatelli, Lucia %Y Ji, Heng %Y Kurohashi, Sadao %Y Paggio, Patrizia %Y Xue, Nianwen %Y Kim, Seokhwan %Y Hahm, Younggyun %Y He, Zhong %Y Lee, Tony Kyungil %Y Santus, Enrico %Y Bond, Francis %Y Na, Seung-Hoon %S Proceedings of the 29th International Conference on Computational Linguistics %D 2022 %8 October %I International Committee on Computational Linguistics %C Gyeongju, Republic of Korea %F salaam-etal-2022-offensive %X The prevalent use of offensive content in social media has become an important reason for concern for online platforms (customer service chat-boxes, social media platforms, etc). Classifying offensive and hate-speech content in online settings is an essential task in many applications that needs to be addressed accordingly. However, online text from online platforms can contain code-switching, a combination of more than one language. The non-availability of labeled code-switched data for low-resourced code-switching combinations adds difficulty to this problem. To overcome this, we release a real-world dataset containing around 10k samples for testing for three language combinations en-fr, en-es, and en-de, and a synthetic code-switched textual dataset containing ~30k samples for training In this paper, we describe the process for gathering the human-generated data and our algorithm for creating synthetic code-switched offensive content data. We also introduce the results of a keyword classification baseline and a multi-lingual transformer-based classification model. %U https://aclanthology.org/2022.coling-1.575 %P 6617-6624