Jash Mehta


2023

pdf bib
A Federated Approach for Hate Speech Detection
Jay Gala | Deep Gandhi | Jash Mehta | Zeerak Talat
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Hate speech detection has been the subject of high research attention, due to the scale of content created on social media. In spite of the attention and the sensitive nature of the task, privacy preservation in hate speech detection has remained under-studied. The majority of research has focused on centralised machine learning infrastructures which risk leaking data. In this paper, we show that using federated machine learning can help address privacy the concerns that are inherent to hate speech detection while obtaining up to 6.81% improvement in terms of F1-score.

2022

pdf bib
A Federated Approach to Predicting Emojis in Hindi Tweets
Deep Gandhi | Jash Mehta | Nirali Parekh | Karan Waghela | Lynette D’Mello | Zeerak Talat
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The use of emojis affords a visual modality to, often private, textual communication.The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis.Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches.However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage.In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people’s data.We introduce a new dataset of 118k tweets (augmented from 25k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.

2021

pdf bib
IndicFed: A Federated Approach for Sentiment Analysis in Indic Languages
Jash Mehta | Deep Gandhi | Naitik Rathod | Sudhir Bagul
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

The task of sentiment analysis has been extensively studied in high-resource languages. Even though sentiment analysis is studied for some resource-constrained languages, the corpora and the datasets available in other low resource languages are scarce and fragmented. This prevents further research of resource-constrained languages and also inhibits model performance for these languages. Privacy concerns may also be raised while aggregating some datasets for training central models. Our work tries to steer the research of sentiment analysis for resource-constrained languages in the direction of Federated Learning. We conduct various experiments to compare server based and federated approaches for 4 Indic Languages - Marathi, Hindi, Bengali, and Telugu. Specifically, we show that a privacy preserving approach, Federated Learning surpasses traditional server trained LSTM model and exhibits comparable performance to other servers-side transformer models.