Sujan Kumar Saha


2023

pdf bib
Handwritten Text Segmentation Using U-Net and Shuffled Frog-Leaping Algorithm with Scale Space Technique
Moumita Moitra | Sujan Kumar Saha
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

The paper introduces a new method for segmenting words from handwritten Bangla documents. We found that the available handwritten character recognition (HCR) systems do not provide the desired accuracy in recognizing the text written by school students. Recognizing students’ handwritten text becomes challenging due to certain factors, including a non-uniform gap between lines and words, and ambiguous, overlapping characters. The performance may be improved if the words in the text are segmented correctly before recognition. For the segmentation, we propose a combination of U-Net and a modified Scale Space method enhanced by the Shuffled Frog-Leaping Algorithm (SFLA). We employ the U-Net model for line segmentation; it effectively handles the variable spacing and skewed lines. After line segmentation, for segmenting the words, we use SFLA with Scale Space, allowing adaptive scaling and optimized parameter tuning. The proposed technique has been tested on two datasets: the openly available BN-HTR dataset and an in-house dataset prepared by collecting Bengali handwritten answer books from schools. In our experiments, we found that the proposed technique achieved promising performance on both datasets.

2015

pdf bib
A System for Generating Multiple Choice Questions: With a Novel Approach for Sentence Selection
Mukta Majumder | Sujan Kumar Saha
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

2008

pdf bib
Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER
Sujan Kumar Saha | Pabitra Mitra | Sudeshna Sarkar
Proceedings of ACL-08: HLT

pdf bib
A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition
Sujan Kumar Saha | Sudeshna Sarkar | Pabitra Mitra
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
A Hybrid Named Entity Recognition System for South and South East Asian Languages
Sujan Kumar Saha | Sanjay Chatterji | Sandipan Dandapat | Sudeshna Sarkar | Pabitra Mitra
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
Gazetteer Preparation for Named Entity Recognition in Indian Languages
Sujan Kumar Saha | Sudeshna Sarkar | Pabitra Mitra
Proceedings of the 6th Workshop on Asian Language Resources