Nabeel Mohammed

2026

BanSuite: A Unified Toolkit and Software Platform for Low-Resource NLP in Bangla
Md. Abu Sayed | Faisal Ahamed Khan | Jannatul Ferdous Tuli | Nabeel Mohammed | Mohammad Ruhul Amin | Mohammad Mamun Or Rashid
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Bangla is one of the world’s most widely spoken languages, yet it remains significantly under-resourced in natural language processing (NLP). Existing efforts have focused on isolated tasks such as Part-of-Speech (POS) tagging and Named Entity Recognition (NER), but comprehensive, integrated systems for core NLP tasks including Shallow Parsing and Dependency Parsing are largely absent. To address this gap, we present BanSuite, a unified Bangla NLP ecosystem developed under the EBLICT project. BanSuite combines a large-scale, manually annotated Bangla Treebank with high-quality pretrained models for POS tagging, NER, shallow parsing, and dependency parsing, achieving strong in-domain baseline performance (POS: 90.16 F1, NER: 90.11 F1, SP: 86.92 F1, DP: 90.27 UAS). The system is accessible through a Python toolkit (Bkit) and a Web Application, providing both researchers and non-technical users with robust NLP functionalities, including tokenization, normalization, lemmatization, and syntactic parsing. In benchmarking against existing Bangla NLP tools and multilingual Large Language Models (LLMs), BanSuite demonstrates superior task performance while maintaining high efficiency in resource usage. By offering the first comprehensive, open, and integrated NLP platform for Bangla, BanSuite lays a scalable foundation for research, application development, and further advancement of low-resource language technologies. A demonstration video is provided to illustrate the system’s functionality in https://youtu.be/3pcfiUQfCoA

2025

pdf bib abs

Generative Data Augmentation for Improving Semantic Classification
Shadman Rohan | Mahmud Elahi Akhter | Ibraheem Muhammad Moosa | Nabeel Mohammed | Amin Ahsan Ali | Akmmahbubur Rahman
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)

We study sentence-level generative data augmentation for Bangla semantic classification across four public datasets and three pretrained model families (BanglaBERT, XLM-Indic, mBERT). We evaluate two widely used, reproducible techniques—paraphrasing (mT5-based) and round-trip backtranslation (Bn–En–Bn)—and analyze their impact under realistic class imbalance. Overall, augmentation often helps, but gains are tightly coupled to label quality: paraphrasing typically outperforms backtranslation and yields the most consistent improvements for the monolingual model, whereas multilingual encoders benefit less and can be more sensitive to noisy minority-class expansions. A key empirical observation is that the neutral class appears to be a major source of annotation noise, which degrades decision boundaries and can cap the benefits of augmentation even when positive/negative classes are clean and polarized. We provide practical guidance for Bangla sentiment pipelines: (i) use simple sentence-level augmentation to rebalance classes when labels are reliable; (ii) allocate additional curation and higher inter-annotator agreement targets to the neutral class. Our results indicate when augmentation helps and suggest that data quality—not model choice alone—can become the limiting factor.

pdf bib abs

NSU_MILab at BLP-2025 Task 1: Decoding Bangla Hate Speech: Fine-Grained Type and Target Detection via Transformer Ensembles
Md. Mohibur Rahman Nabil | Muhammad Rafsan Kabir | Rakib Islam | Fuad Rahman | Nabeel Mohammed | Shafin Rahman
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)

This paper describes our participation in Task 1A and Task 1B of the Task 1A and Task 1B of the BLP Workshop, focused on Bangla Multi-task Hatespeech Identification. Our approach involves systematic evaluation of four transformer models: BanglaBERT, XLM-RoBERTa, IndicBERT, and Bengali Abusive MuRIL. To enhance performance, we implemented an ensemble strategy that averages output probabilities from these transformer models, which consistently outperformed individual models across both tasks. The baseline classical methods demonstrated limitations in capturing complex linguistic cues, underscoring the superiority of transformer-based approaches for low-resource hate speech detection. Our solution initially achieved F1 scores of 0.7235 (ranked 12th) for Task 1A and 0.6981 (ranked 17th) for Task 1B among participating teams. Through post-competition refinements, we improved our Task 1B performance to 0.7331, demonstrating the effectiveness of ensemble methods in Bangla hate speech detection.

pdf bib abs

Bidirectional Reasoning Supervision for Multilingual Financial Decision Making
Muhammad Rafsan Kabir | Jawad Ibn Ahad | Robin Krambroeckers | Silvia Ahmed | M M Lutfe Elahi | Nabeel Mohammed | Shafin Rahman
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large Language Models have achieved great success in tasks like sentiment analysis, machine translation, and question answering, yet their effectiveness in the multilingual financial domain remains less explored. This study explores the potential of generative LLMs for classifying financial sustainability in four diverse languages: English, Hindi, Bengali, and Telugu, representing low, medium, and high-resource language categories. We propose a novel fine-tuning approach that integrates both positive and negative rationales alongside classification labels. Unlike existing approaches, our method improves classification performance by incorporating structured bidirectional reasoning into financial decision-making. Extensive evaluations demonstrate that the proposed approach consistently outperforms prior methods across all four languages, establishing new benchmark results for multilingual financial NLP. Notably, it also enables smaller models to achieve competitive or even superior performance compared to significantly larger models fine-tuned with conventional methods, demonstrating its suitability for industry applications.

pdf bib abs

CAPSTONE: Composable Attribute‐Prompted Scene Translation for Zero‐Shot Vision–Language Reasoning
Md. Ismail Hossain | Shahriyar Zaman Ridoy | Moshiur Farazi | Nabeel Mohammed | Shafin Rahman
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Interpreting visual scenes with high-level reasoning is essential for many real-world applications, such as autonomous systems andcontent moderation, but training and maintaining Vision–Language Models (VLMs) remains resource-intensive and opaque. In this work, we present CAPSTONE, a lightweight, modular framework designed for industrial settings. Instead of relying on multimodal training or fine-tuning large models, CAPSTONE transforms outputs from off-the-shelf vision models into structured text prompts that can be interpreted by a frozen Large Language Model (LLM). This plug-and-play architecture enables reasoning over visual input without access to raw pixels, dramatically reducing computational cost and complexity. On the POPE dataset, our system, using a 7B LLM, outperforms several fully trained VLMs in zero-shot evaluations, while on the VSR benchmark, the 4B model achieves competitive results, together demonstrating strong generalization without retraining. CAPSTONE offers a scalable and interpretable alternative for companies looking to integrate visual reasoning capabilities without the burden of full-scale VLM pipelines.

pdf bib abs

Large Language Models (LLMs) excel at complexreasoning tasks, yet their performance hinges on the quality of their prompts and pipeline structures. Manual promptdesign, as used in frameworks like DSPy, poses significantlimitations: it is time-intensive, demands substantial expertise,and lacks scalability, restricting the widespread use of LLMsacross diverse applications. To overcome these challenges, weintroduce AutoDSPy, the first framework to fully automateDSPy pipeline construction using reinforcement learning (RL).AutoDSPy leverages an RL-tuned policy network to dynamicallyselect optimal reasoning modules—such as Chain-of-Thought forlogical tasks or ReAct for tool integration—along with inputoutput signatures and execution strategies, entirely eliminatingthe need for manual configuration. Experimental results on theGSM8K and HotPotQA benchmarks demonstrate that AutoDSPyoutperforms traditional DSPy baselines, achieving accuracy gainsof up to 4.3% while reducing inference time, even with smallermodels like GPT-2 (127M). By integrating RL-based automation,AutoDSPy enhances both efficiency and accessibility, simplifyingthe development of structured, high-performing LLM solutionsand enabling scalability across a wide range of tasks

pdf bib abs

In this study, we introduce BanNERD, the most extensive human-annotated and validated Bangla Named Entity Recognition Dataset to date, comprising over 85,000 sentences. BanNERD is curated from a diverse array of sources, spanning over 29 domains, thereby offering a comprehensive range of generalized contexts. To ensure the dataset’s quality, expert linguists developed a detailed annotation guideline tailored to the Bangla language. All annotations underwent rigorous validation by a team of validators, with final labels being determined via majority voting, thereby ensuring the highest annotation quality and a high IAA score of 0.88. In a cross-dataset evaluation, models trained on BanNERD consistently outperformed those trained on four existing Bangla NER datasets. Additionally, we propose a method named BanNERCEM (Bangla NER context-ensemble Method) which outperforms existing approaches on Bangla NER datasets and performs competitively on English datasets using lightweight Bangla pretrained LLMs. Our approach passes each context separately to the model instead of previous concatenation-based approaches achieving the highest average macro F1 score of 81.85% across 10 NER classes, outperforming previous approaches and ensuring better context utilization. We are making the code and datasets publicly available at https://github.com/eblict-gigatech/BanNERD in order to contribute to the further advancement of Bangla NLP.

2023

pdf bib abs

This paper presents a computational approach for creating a dataset on communal violence in the context of Bangladesh and West Bengal of India and benchmark evaluation. In recent years, social media has been used as a weapon by factions of different religions and backgrounds to incite hatred, resulting in physical communal violence and causing death and destruction. To prevent such abusive use of online platforms, we propose a framework for classifying online posts using an adaptive question-based approach. We collected more than 168,000 YouTube comments from a set of manually selected videos known for inciting violence in Bangladesh and West Bengal. Using both unsupervised and later semi-supervised topic modeling methods on those unstructured data, we discovered the major word clusters to interpret the related topics of peace and violence. Topic words were later used to select 20,142 posts related to peace and violence of which we annotated a total of 6,046 posts. Finally, we applied different modeling techniques based on linguistic features, and sentence transformers to benchmark the labeled dataset with the best-performing model reaching ~71% macro F1 score.

pdf bib abs

We present the comprehensive technical description of the outcome of the BLP shared task on Violence Inciting Text Detection (VITD).In recent years, social media has become a tool for groups of various religions and backgrounds to spread hatred, leading to physicalviolence with devastating consequences. To address this challenge, the VITD shared task was initiated, aiming to classify the level of violence incitement in various texts. The competition garnered significant interest with a total of 27 teams consisting of 88 participants successfully submitting their systems to the CodaLab leaderboard. During the post-workshop phase, we received 16 system papers on VITD from those participants. In this paper, we intend to discuss the VITD baseline performance, error analysis of the submitted models, and provide a comprehensive summary of the computational techniques applied by the participating teams

pdf bib abs

Lemmatization holds significance in both natural language processing (NLP) and linguistics, as it effectively decreases data density and aids in comprehending contextual meaning. However, due to the highly inflected nature and morphological richness, lemmatization in Bangla text poses a complex challenge. In this study, we propose linguistic rules for lemmatization and utilize a dictionary along with the rules to design a lemmatizer specifically for Bangla. Our system aims to lemmatize words based on their parts of speech class within a given sentence. Unlike previous rule-based approaches, we analyzed the suffix marker occurrence according to the morpho-syntactic values and then utilized sequences of suffix markers instead of entire suffixes. To develop our rules, we analyze a large corpus of Bangla text from various domains, sources, and time periods to observe the word formation of inflected words. The lemmatizer achieves an accuracy of 96.36% when tested against a manually annotated test dataset by trained linguists and demonstrates competitive performance on three previously published Bangla lemmatization datasets. We are making the code and datasets publicly available at https://github.com/eblict-gigatech/BanLemma in order to contribute to the further advancement of Bangla NLP.

pdf bib abs

BenCoref: A Multi-Domain Dataset of Nominal Phrases and Pronominal Reference Annotations
Shadman Rohan | Mojammel Hossain | Mohammad Mamun Or Rashid | Nabeel Mohammed
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

Coreference Resolution is a well studied problem in NLP. While widely studied for English and other resource-rich languages, research on coreference resolution in Bengali largely remains unexplored due to the absence of relevant datasets. Bengali, being a low-resource language, exhibits greater morphological richness compared to English. In this article, we introduce a new dataset, BenCoref, comprising coreference annotations for Bengali texts gathered from four distinct domains. This relatively small dataset contains 5200 mention annotations forming 502 mention clusters within 48,569 tokens. We describe the process of creating this dataset and report performance of multiple models trained using BenCoref. We anticipate that our work sheds some light on the variations in coreference phenomena across multiple domains in Bengali and encourages the development of additional resources for Bengali. Furthermore, we found poor crosslingual performance at zero-shot setting from English, highlighting the need for more language-specific resources for this task.

Co-authors

Venues

LAW1

Fix author