Guneet Singh Kohli


2024

pdf bib
Stance and Hate Event Detection in Tweets Related to Climate Activism - Shared Task at CASE 2024
Surendrabikram Thapa | Kritesh Rauniyar | Farhan Jafri | Shuvam Shiwakoti | Hariram Veeramani | Raghav Jain | Guneet Singh Kohli | Ali Hürriyetoğlu | Usman Naseem
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Social media plays a pivotal role in global discussions, including on climate change. The variety of opinions expressed range from supportive to oppositional, with some instances of hate speech. Recognizing the importance of understanding these varied perspectives, the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) at EACL 2024 hosted a shared task focused on detecting stances and hate speech in climate activism-related tweets. This task was divided into three subtasks: subtasks A and B concentrated on identifying hate speech and its targets, while subtask C focused on stance detection. Participants’ performance was evaluated using the macro F1-score. With over 100 teams participating, the highest F1 scores achieved were 91.44% in subtask C, 78.58% in subtask B, and 74.83% in subtask A. This paper details the methodologies of 24 teams that submitted their results to the competition’s leaderboard.

2023

pdf bib
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida | Idris Abdulmumin | Shamsuddeen Hassan Muhammad | Aneesh Bose | Guneet Singh Kohli | Ibrahim Said Ahmad | Ketan Kotwal | Sayan Deb Sarkar | Ondřej Bojar | Habeebah Kakudi
Findings of the Association for Computational Linguistics: ACL 2023

This paper presents “HaVQA”, the first multimodal dataset for visual question answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

pdf bib
OdiaGenAI’s Participation at WAT2023
Sk Shahid | Guneet Singh Kohli | Sambit Sekhar | Debasish Dhal | Adit Sharma | Shubhendra Kushwaha | Shantipriya Parida | Stig-Arne Grönroos | Satya Ranjan Dash
Proceedings of the 10th Workshop on Asian Translation

This paper offers an in-depth overview of the team “ODIAGEN’s” translation system submitted to the Workshop on Asian Translation (WAT2023). Our focus lies in the domain of Indic Multimodal tasks, specifically targeting English to Hindi, English to Malayalam, and English to Bengali translations. The system uses a state-of-the-art Transformer-based architecture, specifically the NLLB-200 model, fine-tuned with language-specific Visual Genome Datasets. With this robust system, we were able to manage both text-to-text and multimodal translations, demonstrating versatility in handling different translation modes. Our results showcase strong performance across the board, with particularly promising results in the Hindi and Bengali translation tasks. A noteworthy achievement of our system lies in its stellar performance across all text-to-text translation tasks. In the categories of English to Hindi, English to Bengali, and English to Malayalam translations, our system claimed the top positions for both the evaluation and challenge sets. This system not only advances our understanding of the challenges and nuances of Indic language translation but also opens avenues for future research to enhance translation accuracy and performance.

pdf bib
Arguably at SemEval-2023 Task 11: Learning the disagreements using unsupervised behavioral clustering and language models
Guneet Singh Kohli | Vinayak Tiwari
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We describe SemEval-2023 Task 11 on behavioral segregation of annotations to find the similarities and contextual thinking of a group of annotators. We have utilized a behavioral segmentation analysis on the annotators to model them independently and combine the results to yield soft and hard scores. Our team focused on experimenting with hierarchical clustering with various distance metrics for similarity, dissimilarity, and reliability. We modeled the clusters and assigned weightage to find the soft and hard scores. Our team was able to find out hidden behavioral patterns among the judgments of annotators after rigorous experiments. The proposed system is made available.

2022

pdf bib
Team AINLPML @ MuP in SDP 2021: Scientific Document Summarization by End-to-End Extractive and Abstractive Approach
Sandeep Kumar | Guneet Singh Kohli | Kartik Shinde | Asif Ekbal
Proceedings of the Third Workshop on Scholarly Document Processing

This paper introduces the proposed summarization system of the AINLPML team for the First Shared Task on Multi-Perspective Scientific Document Summarization at SDP 2022. We present a method to produce abstractive summaries of scientific documents. First, we perform an extractive summarization step to identify the essential part of the paper. The extraction step includes utilizing a contributing sentence identification model to determine the contributing sentences in selected sections and portions of the text. In the next step, the extracted relevant information is used to condition the transformer language model to generate an abstractive summary. In particular, we fine-tuned the pre-trained BART model on the extracted summary from the previous step. Our proposed model successfully outperformed the baseline provided by the organizers by a significant margin. Our approach achieves the best average Rouge F1 Score, Rouge-2 F1 Score, and Rouge-L F1 Score among all submissions.