Md Zobaer Hossain


2024

pdf bib
NumDecoders at SemEval-2024 Task 7: FlanT5 and GPT enhanced with CoT for Numerical Reasoning
Andres Gonzalez | Md Zobaer Hossain | Jahedul Alam Junaed
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper we present a Chain-of-Thought enhanced solution for large language models, including flanT5 and GPT 3.5 Turbo, aimed at solving mathematical problems to fill in blanks from news headlines. Our approach builds on adata augmentation strategy that incorporates additional mathematical reasoning observations into the original dataset sourced from another mathematical corpus. Both automatic and manual annotations are applied to explicitly describe the reasoning steps required for models to reach the target answer. We employ an ensemble majority voting method to generate finalpredictions across our best-performing models. Our analysis reveals that while larger models trained with our enhanced dataset achieve significant gains (91% accuracy, ranking 5th on the NumEval Task 3 leaderboard), smaller models do not experience improvements and may even see a decrease in overall accuracy. We conclude that improving our automatic an-notations via crowdsourcing methods can be a worthwhile endeavor to train larger models than the ones from this study to see the most accurate results.

2023

pdf bib
garNER at SemEval-2023: Simplified Knowledge Augmentation for Multilingual Complex Named Entity Recognition
Md Zobaer Hossain | Averie Ho Zoen So | Silviya Silwal | H. Andres Gonzalez Gongora | Ahnaf Mozib Samin | Jahedul Alam Junaed | Aritra Mazumder | Sourav Saha | Sabiha Tahsin Soha
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our solution, garNER, to the SemEval-2023 MultiConer task. We propose a knowledge augmentation approach by directly querying entities from the Wikipedia API and appending the summaries of the entities to the input sentence. These entities are either retrieved from the labeled training set (Gold Entity) or from off-the-shelf entity taggers (Entity Extractor). Ensemble methods are then applied across multiple models to get the final prediction. Our analysis shows that the added contexts are beneficial only when such contexts are relevant to the target-named entities, but detrimental when the contexts are irrelevant.

2020

pdf bib
BanFakeNews: A Dataset for Detecting Fake News in Bangla
Md Zobaer Hossain | Md Ashraful Rahman | Md Saiful Islam | Sudipta Kar
Proceedings of the Twelfth Language Resources and Evaluation Conference

Observing the damages that can be done by the rapid propagation of fake news in various sectors like politics and finance, automatic identification of fake news using linguistic analysis has drawn the attention of the research community. However, such methods are largely being developed for English where low resource languages remain out of the focus. But the risks spawned by fake and manipulative news are not confined by languages. In this work, we propose an annotated dataset of ≈ 50K news that can be used for building automated fake news detection systems for a low resource language like Bangla. Additionally, we provide an analysis of the dataset and develop a benchmark system with state of the art NLP techniques to identify Bangla fake news. To create this system, we explore traditional linguistic features and neural network based methods. We expect this dataset will be a valuable resource for building technologies to prevent the spreading of fake news and contribute in research with low resource languages. The dataset and source code are publicly available at https://github.com/Rowan1697/FakeNews.