Akm Shahariar Azad Rabby
Also published as: AKM Shahariar Azad Rabby
2026
BornoDrishti: Leveraging Vision Encoders and Domain-Adaptive Learning for Bangla OCR on Diverse Documents
S M Jishanul Islam | Md Mehedi Hasan | Masbul Haider Ovi | Akm Shahariar Azad Rabby | Fuad Rahman
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
S M Jishanul Islam | Md Mehedi Hasan | Masbul Haider Ovi | Akm Shahariar Azad Rabby | Fuad Rahman
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
OCR for Bangla scripts remains a challenging problem, with existing solutions limited to single-domain processing. Current approaches lack a unified vision encoder that can understand diverse Bangla script variations, hindering practical deployment. We present BornoDrishti, the first unified OCR system based on the vision transformer that accurately recognizes both printed and handwritten Bangla scripts within a single model. Our approach introduces a novel domain objective that enables the model to learn domain-invariant representations while preserving script-specific features, eliminating the need for separate domain experts. BornoDrishti achieves competitive accuracy across both domains, setting state-of-the-art performance for printed scripts and demonstrating that a single unified model can match or exceed specialized uni-domain systems. We evaluate our model against state-of-the-art domain-specific and cross-domain OCR systems. This work establishes a foundation for advancing practical applications by using a unified multi-domain OCR system for complex Bangla scripts.
2023
Gold Standard Bangla OCR Dataset: An In-Depth Look at Data Preprocessing and Annotation Processes
Hasmot Ali | AKM Shahariar Azad Rabby | Md Majedul Islam | A.k.m Mahamud | Nazmul Hasan | Fuad Rahman
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Hasmot Ali | AKM Shahariar Azad Rabby | Md Majedul Islam | A.k.m Mahamud | Nazmul Hasan | Fuad Rahman
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
This research paper focuses on developing an improved Bangla Optical Character Recognition (OCR) system, addressing the challenges posed by the complexity of Bangla text structure, diverse handwriting styles, and the scarcity of comprehensive datasets. Leveraging recent advancements in Deep Learning and OCR techniques, we anticipate a significant enhancement in the performance of Bangla OCR by utilizing a large and diverse collection of labeled Bangla text image datasets. This study introduces the most extensive gold standard corpus for Bangla characters and words, comprising over 4 million human-annotated images. Our dataset encompasses various document types, such as Computer Compose, Letterpress, Typewriters, Outdoor Banner-Poster, and Handwritten documents, gathered from diverse sources. The entire corpus has undergone meticulous human annotation, employing a controlled annotation procedure consisting of three-step annotation and one-step validation, ensuring adherence to gold standard criteria. This paper provides a comprehensive overview of the complete data collection procedure. The ICT Division, Government of the People’s Republic of Bangladesh, will make the dataset publicly available, facilitating further research and development in Bangla OCR and related domains.