A Federated Approach to Predicting Emojis in Hindi Tweets

Deep Gandhi; Jash Mehta; Nirali Parekh; Karan Waghela; Lynette D’Mello; Zeerak Talat

doi:10.18653/v1/2022.emnlp-main.819

A Federated Approach to Predicting Emojis in Hindi Tweets

Deep Gandhi, Jash Mehta, Nirali Parekh, Karan Waghela, Lynette D’Mello, Zeerak Talat

Abstract

The use of emojis affords a visual modality to, often private, textual communication.The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis.Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches.However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage.In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people’s data.We introduce a new dataset of 118k tweets (augmented from 25k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.

Anthology ID:: 2022.emnlp-main.819
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11951–11961
Language:
URL:: https://aclanthology.org/2022.emnlp-main.819/
DOI:: 10.18653/v1/2022.emnlp-main.819
Bibkey:
Cite (ACL):: Deep Gandhi, Jash Mehta, Nirali Parekh, Karan Waghela, Lynette D’Mello, and Zeerak Talat. 2022. A Federated Approach to Predicting Emojis in Hindi Tweets. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11951–11961, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: A Federated Approach to Predicting Emojis in Hindi Tweets (Gandhi et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.819.pdf

PDF Cite Search Fix data