Jordan Kralev


2024

pdf bib
Deep Learning Framework for Identifying Future Market Opportunities from Textual User Reviews
Jordan Kralev
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

The paper develops an application of design gap theory for identification of future market segment growth and capitalization from a set of customer reviews for bought products from the market in a given past period. To build a consumer feature space, an encoded-decoder network with attention is trained over the textual reviews after they are pre-processed through tokenization and embedding layers. The encodings for product reviews are used to train a variational auto encoder network for representation of a product feature space. The sampling capabilities of this network are extended with a function to look for innovative designs with high consumer preferences, characterizing future opportunities in a given market segment. The framework is demonstrated for processing of Amazon reviews in consumer electronics segment.

2022

pdf bib
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset
Svetla Koeva | Ivelina Stoyanova | Jordan Kralev
Proceedings of the Thirteenth Language Resources and Evaluation Conference

One of the processing tasks for large multimodal data streams is automatic image description (image classification, object segmentation and classification). Although the number and the diversity of image datasets is constantly expanding, still there is a huge demand for more datasets in terms of variety of domains and object classes covered. The goal of the project Multilingual Image Corpus (MIC 21) is to provide a large image dataset with annotated objects and object descriptions in 24 languages. The Multilingual Image Corpus consists of an Ontology of visual objects (based on WordNet) and a collection of thematically related images whose objects are annotated with segmentation masks and labels describing the ontology classes. The dataset is designed both for image classification and object detection and for semantic segmentation. The main contributions of our work are: a) the provision of large collection of high quality copyright-free images; b) the formulation of the Ontology of visual objects based on WordNet noun hierarchies; c) the precise manual correction of automatic object segmentation within the images and the annotation of object classes; and d) the association of objects and images with extended multilingual descriptions based on WordNet inner- and interlingual relations. The dataset can be used also for multilingual image caption generation, image-to-text alignment and automatic question answering for images and videos.

pdf bib
Image Models for large-scale Object Detection and Classification
Jordan Kralev | Svetla Koeva
Proceedings of the Fifth International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

Recent developments in computer vision applications that are based on machine learning models allow real-time object detection, segmentation and captioning in image or video streams. The paper presents the development of an extension of the 80 COCO categories into a novel ontology with more than 700 classes covering 130 thematic subdomains related to Sport, Transport, Arts and Security. The development of an image dataset of object segmentation was accelerated by machine learning for automatic generation of objects’ boundaries and classes. The Multilingual image dataset contains over 20,000 images and 200,000 annotations. It was used to pre-train 130 models for object detection and classification. We show the established approach for the development of the new models and their integration into an application and evaluation framework.