Sourav Saha

2025

Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation
K M Nafi Asib | Sourav Saha | Mohammed Moshiul Hoque
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)

Large Language Models (LLMs) have advanced the automated generation of code from natural language prompts. However, low-resource languages (LRLs) like Bangla remain underrepresented due to the limited availability of instruction-to-code datasets and evaluation benchmarks. To address this, the BLP Workshop at IJCNLP-AACL 2025 introduced a shared task on “Code Generation in Bangla”. In this work, we propose a method that combines instruction prompting with a test-driven, feedback-guided iterative refinement process using a fine-tuned Qwen2.5-14B model. The model generates code from Bangla instructions, tests it against unit tests, and iteratively refines any failing outputs through three evaluation passes, using test feedback to guide each step. This approach helped our team “Retriv” to secure 2nd place in the shared task with a Pass@1 score of 0.934. The analysis highlights challenges in Bangla instruction understanding and Python code generation, emphasizing the need for targeted methods in LRLs. We made experimental scripts publicly available for the community.

pdf bib abs

Retriv at BLP-2025 Task 1: A Transformer Ensemble and Multi-Task Learning Approach for Bangla Hate Speech Identification
Sourav Saha | K M Nafi Asib | Mohammed Moshiul Hoque
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)

This paper addresses the problem of Bangla hate speech identification, a socially impactful yet linguistically challenging task. As part of the Bangla Multi-task Hate Speech Identification shared task at the BLP Workshop, IJCNLP-AACL 2025, we participated in all three subtasks: (1A) hate type classification, (1B) target group identification, and (1C) joint detection of type, severity, and target. For subtasks 1A and 1B, we employed a soft-voting ensemble of transformer models (BanglaBERT, MuRIL, IndicBERTv2). For subtask 1C, we trained three multitask variants and aggregated their predictions through a weighted voting ensemble. Our systems achieved micro-f₁ scores of 72.75% (1A) and 72.69% (1B), and a weighted micro-f₁ score of 72.62% (1C). On the shared task leaderboard, these corresponded to 9th, 10th, and 7th positions, respectively. These results highlight the promise of transformer ensembles and weighted multitask frameworks for advancing Bangla hate speech detection in low-resource contexts. We made experimental scripts publicly available for the community.

2024

pdf bib

BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation
Sourav Saha | Zeshan Ahmed Nobin | Mufassir Ahmad Chowdhury | Md. Shakirul Hasan Khan Mobin | Mohammad Ruhul Amin | Sudipta Kar
Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024

2023

pdf bib abs

This paper presents a computational approach for creating a dataset on communal violence in the context of Bangladesh and West Bengal of India and benchmark evaluation. In recent years, social media has been used as a weapon by factions of different religions and backgrounds to incite hatred, resulting in physical communal violence and causing death and destruction. To prevent such abusive use of online platforms, we propose a framework for classifying online posts using an adaptive question-based approach. We collected more than 168,000 YouTube comments from a set of manually selected videos known for inciting violence in Bangladesh and West Bengal. Using both unsupervised and later semi-supervised topic modeling methods on those unstructured data, we discovered the major word clusters to interpret the related topics of peace and violence. Topic words were later used to select 20,142 posts related to peace and violence of which we annotated a total of 6,046 posts. Finally, we applied different modeling techniques based on linguistic features, and sentence transformers to benchmark the labeled dataset with the best-performing model reaching ~71% macro F1 score.

pdf bib abs

This paper presents our solution, garNER, to the SemEval-2023 MultiConer task. We propose a knowledge augmentation approach by directly querying entities from the Wikipedia API and appending the summaries of the entities to the input sentence. These entities are either retrieved from the labeled training set (Gold Entity) or from off-the-shelf entity taggers (Entity Extractor). Ensemble methods are then applied across multiple models to get the final prediction. Our analysis shows that the added contexts are beneficial only when such contexts are relevant to the target-named entities, but detrimental when the contexts are irrelevant.

pdf bib abs

We present the comprehensive technical description of the outcome of the BLP shared task on Violence Inciting Text Detection (VITD).In recent years, social media has become a tool for groups of various religions and backgrounds to spread hatred, leading to physicalviolence with devastating consequences. To address this challenge, the VITD shared task was initiated, aiming to classify the level of violence incitement in various texts. The competition garnered significant interest with a total of 27 teams consisting of 88 participants successfully submitting their systems to the CodaLab leaderboard. During the post-workshop phase, we received 16 system papers on VITD from those participants. In this paper, we intend to discuss the VITD baseline performance, error analysis of the submitted models, and provide a comprehensive summary of the computational techniques applied by the participating teams

Sourav Saha

2025

2024

2023

Co-authors

Venues