Pakray Partha


2023

pdf bib
Multilingual Multimodal Text Detection in Indo-Aryan Languages
Basisth Nihar | Halder Eisha | Sachan Tushar | Vetagiri Advaitha | Pakray Partha
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Multi-language text detection and recognition in complex visual scenes is an essential yet challenging task. Traditional pipelines relying on optical character recognition (OCR) often fail to generalize across different languages, fonts, orientations and imaging conditions. This work proposes a novel approach using the YOLOv5 object detection model architecture for multilanguage text detection in images and videos. We curate and annotate a new dataset of over 4,000 scene text images across 4 Indian languages and use specialized data augmentation techniques to improve model robustness. Transfer learning from a base YOLOv5 model pretrained on COCO is combined with tailored optimization strategies for multi-language text detection. Our approach achieves state-of-theart performance, with over 90% accuracy on multi-language text detection across all four languages in our test set. We demonstrate the effectiveness of fine-tuning YOLOv5 for generalized multi-language text extraction across diverse fonts, scales, orientations, and visual contexts. Our approach’s high accuracy and generalizability could enable numerous applications involving multilingual text processing from imagery and video.

pdf bib
Bi-Quantum Long Short-Term Memory for Part-of-Speech Tagging
Pandey Shyambabu | Pakray Partha
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Natural language processing (NLP) is a subfield of artificial intelligence that enables computer systems to understand and generate human language. NLP tasks involved machine learning and deep learning methods for processing the data. Traditional applications utilize massive datasets and resources to perform NLP applications, which is challenging for classical systems. On the other hand, Quantum computing has emerged as a promising technology with the potential to address certain computational problems more efficiently than classical computing in specific domains. In recent years, researchers have started exploring the application of quantum computing techniques to NLP tasks. In this paper, we propose a quantum-based deep learning model, Bi-Quantum long short-term memory (BiQLSTM). We apply POS tagging using the proposed model on social media code-mixed datasets.