Sukomal Pal


2023

pdf bib
Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization
Pranav Nair | Sukomal Pal | Pradeepika Verma
Findings of the Association for Computational Linguistics: ACL 2023

Domain generalization is hitherto an underexplored area applied in abstractive summarization. Moreover, most existing works on domain generalization have sophisticated training algorithms. In this paper, we propose a lightweight, weight averaging based, Domain Aligned Prefix Averaging approach to domain generalization for abstractive summarization. Given a number of source domains, our method first trains a prefix for each one of them. These source prefixes generate summaries for a small number of target domain documents. The similarity of the generated summaries to their corresponding source documents is used for calculating weights required to average source prefixes. In DAPA, prefix tuning allows for lightweight finetuning, and weight averaging allows for the computationally efficient addition of new source domains. When evaluated on four diverse summarization domains, DAPA shows comparable or better performance against the baselines demonstrating the effectiveness of its prefix averaging scheme.

2020

pdf bib
An Indian Language Social Media Collection for Hate and Offensive Speech
Anita Saroj | Sukomal Pal
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language

In social media, people express themselves every day on issues that affect their lives. During the parliamentary elections, people’s interaction with the candidates in social media posts reflects a lot of social trends in a charged atmosphere. People’s likes and dislikes on leaders, political parties and their stands often become subject of hate and offensive posts. We collected social media posts in Hindi and English from Facebook and Twitter during the run-up to the parliamentary election 2019 of India (PEI data-2019). We created a dataset for sentiment analysis into three categories: hate speech, offensive and not hate, or not offensive. We report here the initial results of sentiment classification for the dataset using different classifiers.

pdf bib
IRLab@IITBHU at WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets using BERT
Supriya Chanda | Eshita Nandy | Sukomal Pal
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

This paper reports our submission to the shared Task 2: Identification of informative COVID-19 English tweets at W-NUT 2020. We attempted a few techniques, and we briefly explain here two models that showed promising results in tweet classification tasks: DistilBERT and FastText. DistilBERT achieves a F1 score of 0.7508 on the test set, which is the best of our submissions.

pdf bib
IRlab@IITV at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media Using SVM
Anita Saroj | Supriya Chanda | Sukomal Pal
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the IRlab@IIT-BHU system for the OffensEval 2020. We take the SVM with TF-IDF features to identify and categorize hate speech and offensive language in social media for two languages. In subtask A, we used a linear SVM classifier to detect abusive content in tweets, achieving a macro F1 score of 0.779 and 0.718 for Arabic and Greek, respectively.