Boris Velichkov


pdf bib
Comparative Analysis of Fine-tuned Deep Learning Language Models for ICD-10 Classification Task for Bulgarian Language
Boris Velichkov | Sylvia Vassileva | Simeon Gerginov | Boris Kraychev | Ivaylo Ivanov | Philip Ivanov | Ivan Koychev | Svetla Boytcheva
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The task of automatic diagnosis encoding into standard medical classifications and ontologies, is of great importance in medicine - both to support the daily tasks of physicians in the preparation and reporting of clinical documentation, and for automatic processing of clinical reports. In this paper we investigate the application and performance of different deep learning transformers for automatic encoding in ICD-10 of clinical texts in Bulgarian. The comparative analysis attempts to find which approach is more efficient to be used for fine-tuning of pretrained BERT family transformer to deal with a specific domain terminology on a rare language as Bulgarian. On the one side are used SlavicBERT and MultiligualBERT, that are pretrained for common vocabulary in Bulgarian, but lack medical terminology. On the other hand in the analysis are used BioBERT, ClinicalBERT, SapBERT, BlueBERT, that are pretrained for medical terminology in English, but lack training for language models in Bulgarian, and more over for vocabulary in Cyrillic. In our research study all BERT models are fine-tuned with additional medical texts in Bulgarian and then applied to the classification task for encoding medical diagnoses in Bulgarian into ICD-10 codes. Big corpora of diagnosis in Bulgarian annotated with ICD-10 codes is used for the classification task. Such an analysis gives a good idea of which of the models would be suitable for tasks of a similar type and domain. The experiments and evaluation results show that both approaches have comparable accuracy.

pdf bib
Automatic Transformation of Clinical Narratives into Structured Format
Sylvia Vassileva | Gergana Todorova | Kristina Ivanova | Boris Velichkov | Ivan Koychev | Galia Angelova | Svetla Boytcheva
Proceedings of the Student Research Workshop Associated with RANLP 2021

Vast amounts of data in healthcare are available in unstructured text format, usually in the local language of the countries. These documents contain valuable information. Secondary use of clinical narratives and information extraction of key facts and relations from them about the patient disease history can foster preventive medicine and improve healthcare. In this paper, we propose a hybrid method for the automatic transformation of clinical text into a structured format. The documents are automatically sectioned into the following parts: diagnosis, patient history, patient status, lab results. For the “Diagnosis” section a deep learning text-based encoding into ICD-10 codes is applied using MBG-ClinicalBERT - a fine-tuned ClinicalBERT model for Bulgarian medical text. From the “Patient History” section, we identify patient symptoms using a rule-based approach enhanced with similarity search based on MBG-ClinicalBERT word embeddings. We also identify symptom relations like negation. For the “Patient Status” description, binary classification is used to determine the status of each anatomic organ. In this paper, we demonstrate different methods for adapting NLP tools for English and other languages to a low resource language like Bulgarian.


pdf bib
Deep learning contextual models for prediction of sport event outcome from sportsman’s interviews
Boris Velichkov | Ivan Koychev | Svetla Boytcheva
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper presents an approach for prediction of results for sport events. Usually the sport forecasting approaches are based on structured data. We test the hypothesis that the sports results can be predicted by using natural language processing and machine learning techniques applied over interviews with the players shortly before the sport events. The proposed method uses deep learning contextual models, applied over unstructured textual documents. Several experiments were performed for interviews with players in individual sports like boxing, martial arts, and tennis. The results from the conducted experiment confirmed our initial assumption that an interview from a sportsman before a match contains information that can be used for prediction the outcome from it. Furthermore, the results provide strong evidence in support of our research hypothesis, that is, we can predict the outcome from a sport match analyzing an interview, given before it.


pdf bib
SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter
Boris Velichkov | Borislav Kapukaranov | Ivan Grozev | Jeni Karanesheva | Todor Mihaylov | Yasen Kiprov | Preslav Nakov | Ivan Koychev | Georgi Georgiev
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)