Msvpj Sathvik
2024
Ukrainian Resilience: A Dataset for Detection of Help-Seeking Signals Amidst the Chaos of War
Msvpj Sathvik
|
Abhilash Dowpati
|
Srreyansh Sethi
Findings of the Association for Computational Linguistics: EMNLP 2024
We propose a novel dataset “Ukrainian Resilience” that brings together a collection of social media posts in the Ukrainian language for the detection of help-seeking posts in the Russia-Ukraine war. It is designed to help us analyze and categorize subtle signals in these posts that indicate people are asking for help during times of war. We are using advanced language processing and machine learning techniques to pick up on the nuances of language that show distress or urgency. The dataset is the binary classification of the social media posts that required help and did not require help in the war. The dataset could significantly improve humanitarian efforts, allowing for quicker and more targeted help for those facing the challenges of war. Moreover, the baseline models are implemented and GPT 3.5 achieved an accuracy of 81.15%.
French GossipPrompts: Dataset For Prevention of Generating French Gossip Stories By LLMs
Msvpj Sathvik
|
Abhilash Dowpati
|
Revanth Narra
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
The realm of Large Language Models (LLMs) is undergoing a continuous and dynamic transformation. These state-of-the-art LLMs showcase an impressive ability to craft narratives based on contextual cues, highlighting their skill in comprehending and producing text resembling human writing. However, there exists a potential risk: the potential inclination of LLMs to create gossips when prompted with specific contexts. These LLMs possess the capacity to generate stories rooted in the context provided by the prompts. Yet, this very capability carries a risk of generating gossips given the context as input. To mitigate this, we introduce a dataset named “French GossipPrompts” designed for identifying prompts that lead to the creation of gossipy content in the French language. This dataset employs binary classification, categorizing whether a given prompt generates gossip or not. The dataset comprises a total of 7253 individual prompts. We have developed classification models and achieved an accuracy of 89.95%.
Search