Jay Gala


2023

pdf bib
A Federated Approach for Hate Speech Detection
Jay Gala | Deep Gandhi | Jash Mehta | Zeerak Talat
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Hate speech detection has been the subject of high research attention, due to the scale of content created on social media. In spite of the attention and the sensitive nature of the task, privacy preservation in hate speech detection has remained under-studied. The majority of research has focused on centralised machine learning infrastructures which risk leaking data. In this paper, we show that using federated machine learning can help address privacy the concerns that are inherent to hate speech detection while obtaining up to 6.81% improvement in terms of F1-score.

pdf bib
Developing State-Of-The-Art Massively Multilingual Machine Translation Systems for Related Languages
Jay Gala | Pranjal A. Chitale | Raj Dabre
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract

pdf bib
NICT-AI4B’s Submission to the Indic MT Shared Task in WMT 2023
Raj Dabre | Jay Gala | Pranjal Chitale
Proceedings of the Eighth Conference on Machine Translation

In this paper, we (Team NICT-AI4B) describe our MT systems that we submit to the Indic MT task in WMT 2023. Our primary system consists of 3 stages: Joint denoising and MT training using officially approved monolingual and parallel corpora, backtranslation and, MT training on original and backtranslated parallel corpora. We observe that backtranslation leads to substantial improvements in translation quality up to 4 BLEU points. We also develop 2 contrastive systems on unconstrained settings, where the first system involves fine-tuning of IndicTrans2 DA models on official parallel corpora and seed data used in AI4Bharat et al, (2023), and the second system involves a system combination of the primary and the aforementioned system. Overall, we manage to obtain high-quality translation systems for the 4 low-resource North-East Indian languages of focus.