Galia Angelova


2023

pdf bib
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
Momchil Hardalov | Pepa Atanasova | Todor Mihaylov | Galia Angelova | Kiril Simov | Petya Osenova | Veselin Stoyanov | Ivan Koychev | Preslav Nakov | Dragomir Radev
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present bgGLUE (Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequence labeling, document-level classification, and regression). We run the first systematic evaluation of pre-trained language models for Bulgarian, comparing and contrasting results across the nine tasks in the benchmark. The evaluation results show strong performance on sequence labeling tasks, but there is a lot of room for improvement for tasks that require more complex reasoning. We make bgGLUE publicly available together with the fine-tuning and the evaluation code, as well as a public leaderboard at https://bgglue.github.io, and we hope that it will enable further advancements in developing NLU models for Bulgarian.

pdf bib
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Ruslan Mitkov | Galia Angelova
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

2021

pdf bib
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

pdf bib
Automatic Transformation of Clinical Narratives into Structured Format
Sylvia Vassileva | Gergana Todorova | Kristina Ivanova | Boris Velichkov | Ivan Koychev | Galia Angelova | Svetla Boytcheva
Proceedings of the Student Research Workshop Associated with RANLP 2021

Vast amounts of data in healthcare are available in unstructured text format, usually in the local language of the countries. These documents contain valuable information. Secondary use of clinical narratives and information extraction of key facts and relations from them about the patient disease history can foster preventive medicine and improve healthcare. In this paper, we propose a hybrid method for the automatic transformation of clinical text into a structured format. The documents are automatically sectioned into the following parts: diagnosis, patient history, patient status, lab results. For the “Diagnosis” section a deep learning text-based encoding into ICD-10 codes is applied using MBG-ClinicalBERT - a fine-tuned ClinicalBERT model for Bulgarian medical text. From the “Patient History” section, we identify patient symptoms using a rule-based approach enhanced with similarity search based on MBG-ClinicalBERT word embeddings. We also identify symptom relations like negation. For the “Patient Status” description, binary classification is used to determine the status of each anatomic organ. In this paper, we demonstrate different methods for adapting NLP tools for English and other languages to a low resource language like Bulgarian.

2019

pdf bib
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

pdf bib
Risk Factors Extraction from Clinical Texts based on Linked Open Data
Svetla Boytcheva | Galia Angelova | Zhivko Angelov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper presents experiments in risk factors analysis based on clinical texts enhanced with Linked Open Data (LOD). The idea is to determine whether a patient has risk factors for a specific disease analyzing only his/her outpatient records. A semantic graph of “meta-knowledge” about a disease of interest is constructed, with integrated multilingual terms (labels) of symptoms, risk factors etc. coming from Wikidata, PubMed, Wikipedia and MESH, and linked to clinical records of individual patients via ICD–10 codes. Then a predictive model is trained to foretell whether patients are at risk to develop the disease of interest. The testing was done using outpatient records from a nation-wide repository available for the period 2011-2016. The results show improvement of the overall performance of all tested algorithms (kNN, Naive Bayes, Tree, Logistic regression, ANN), when the clinical texts are enriched with LOD resources.

2018

pdf bib
Tweety at SemEval-2018 Task 2: Predicting Emojis using Hierarchical Attention Neural Networks and Support Vector Machine
Daniel Kopev | Atanas Atanasov | Dimitrina Zlatkova | Momchil Hardalov | Ivan Koychev | Ivelina Nikolova | Galia Angelova
Proceedings of the 12th International Workshop on Semantic Evaluation

We present the system built for SemEval-2018 Task 2 on Emoji Prediction. Although Twitter messages are very short we managed to design a wide variety of features: textual, semantic, sentiment, emotion-, and color-related ones. We investigated different methods of text preprocessing including replacing text emojis with respective tokens and splitting hashtags to capture more meaning. To represent text we used word n-grams and word embeddings. We experimented with a wide range of classifiers and our best results were achieved using a SVM-based classifier and a Hierarchical Attention Neural Network.

2017

bib
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

pdf bib
Mining Association Rules from Clinical Narratives
Svetla Boytcheva | Ivelina Nikolova | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Shallow text analysis (Text Mining) uses mainly Information Extraction techniques. The low resource languages do not allow application of such traditional techniques with sufficient accuracy and recall on big data. In contrast, Data Mining approaches provide an opportunity to make deep analysis and to discover new knowledge. Frequent pattern mining approaches are used mainly for structured information in databases and are a quite challenging task in text mining. Unfortunately, most frequent pattern mining approaches do not use contextual information for extracted patterns: general patterns are extracted regardless of the context. We propose a method that processes raw informal texts (from health discussion forums) and formal texts (outpatient records) in Bulgarian language. In addition we use some context information and small terminological lexicons to generalize extracted frequent patterns. This allows to map informal expression of medical terminology to the formal one and to generate automatically resources.

bib
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Svetla Boytcheva | Kevin Bretonnel Cohen | Guergana Savova | Galia Angelova
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017

pdf bib
Identification of Risk Factors in Clinical Texts through Association Rules
Svetla Boytcheva | Ivelina Nikolova | Galia Angelova | Zhivko Angelov
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017

We describe a method which extracts Association Rules from texts in order to recognise verbalisations of risk factors. Usually some basic vocabulary about risk factors is known but medical conditions are expressed in clinical narratives with much higher variety. We propose an approach for data-driven learning of specialised medical vocabulary which, once collected, enables early alerting of potentially affected patients. The method is illustrated by experimens with clinical records of patients with Chronic Obstructive Pulmonary Disease (COPD) and comorbidity of CORD, Diabetes Melitus and Schizophrenia. Our input data come from the Bulgarian Diabetic Register, which is built using a pseudonymised collection of outpatient records for about 500,000 diabetic patients. The generated Association Rules for CORD are analysed in the context of demographic, gender, and age information. Valuable anounts of meaningful words, signalling risk factors, are discovered with high precision and confidence.

pdf bib
Annotation of Clinical Narratives in Bulgarian language
Ivajlo Radev | Kiril Simov | Galia Angelova | Svetla Boytcheva
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017

In this paper we describe annotation process of clinical texts with morphosyntactic and semantic information. The corpus contains 1,300 discharge letters in Bulgarian language for patients with Endocrinology and Metabolic disorders. The annotated corpus will be used as a Gold standard for information extraction evaluation of test corpus of 6,200 discharge letters. The annotation is performed within Clark system — an XML Based System For Corpora Development. It provides mechanism for semi-automatic annotation first running a pipeline for Bulgarian morphosyntactic annotation and a cascaded regular grammar for semantic annotation is run, then rules for cleaning of frequent errors are applied. At the end the result is manually checked. At the end we hope also to be able to adapted the morphosyntactic tagger to the domain of clinical narratives as well.

2016

pdf bib
SUper Team at SemEval-2016 Task 3: Building a Feature-Rich System for Community Question Answering
Tsvetomila Mihaylova | Pepa Gencheva | Martin Boyanov | Ivana Yovcheva | Todor Mihaylov | Momchil Hardalov | Yasen Kiprov | Daniel Balchev | Ivan Koychev | Preslav Nakov | Ivelina Nikolova | Galia Angelova
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Proceedings of the International Conference Recent Advances in Natural Language Processing
Ruslan Mitkov | Galia Angelova | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
About Emotion Identification in Visual Sentiment Analysis
Olga Kanishcheva | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Voltron: A Hybrid System For Answer Validation Based On Lexical And Distance Features
Ivan Zamanov | Marina Kraeva | Nelly Hateva | Ivana Yovcheva | Ivelina Nikolova | Galia Angelova
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Sublanguage Corpus Analysis Toolkit: A tool for assessing the representativeness and sublanguage characteristics of corpora
Irina Temnikova | William A. Baumgartner Jr. | Negacy D. Hailu | Ivelina Nikolova | Tony McEnery | Adam Kilgarriff | Galia Angelova | K. Bretonnel Cohen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sublanguages are varieties of language that form “subsets” of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed―English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.

2013

pdf bib
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
Ruslan Mitkov | Galia Angelova | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Enriching Patent Search with External Keywords: a Feasibility Study
Ivelina Nikolova | Irina Temnikova | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Measuring Closure Properties of Patent Sublanguages
Irina Temnikova | Negacy Hailu | Galia Angelova | K. Bretonnel Cohen
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Closure Properties of Bulgarian Clinical Text
Irina Temnikova | Ivelina Nikolova | William A. Baumgartner | Galia Angelova | K. Bretonnel Cohen
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Proceedings of the Workshop on NLP for Medicine and Biology associated with RANLP 2013
Guergana Savova | Kevin Bretonnel Cohen | Galia Angelova
Proceedings of the Workshop on NLP for Medicine and Biology associated with RANLP 2013

2012

pdf bib
Automatic Analysis of Patient History Episodes in Bulgarian Hospital Discharge Letters
Svetla Boytcheva | Galia Angelova | Ivelina Nikolova
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Proceedings of the Second Workshop on Biomedical Natural Language Processing
Guergana Savova | Kevin Bretonnel Cohen | Galia Angelova
Proceedings of the Second Workshop on Biomedical Natural Language Processing

pdf bib
Towards Temporal Segmentation of Patient History in Discharge Letters
Galia Angelova | Svetla Boytcheva
Proceedings of the Second Workshop on Biomedical Natural Language Processing

2009

pdf bib
Proceedings of the International Conference RANLP-2009
Galia Angelova | Ruslan Mitkov
Proceedings of the International Conference RANLP-2009

pdf bib
Proceedings of the Workshop on Biomedical Information Extraction
Guergana Savova | Vangelis Karkaletsis | Galia Angelova
Proceedings of the Workshop on Biomedical Information Extraction

pdf bib
Extraction and Exploration of Correlations in Patient Status Data
Svetla Boytcheva | Ivelina Nikolova | Elena Paskaleva | Galia Angelova | Dimitar Tcharaktchiev | Nadya Dimitrova
Proceedings of the Workshop on Biomedical Information Extraction

2004

pdf bib
Towards deeper understanding and personalisation in CALL
Galia Angelova | Albena Strupchanska | Ognyan Kalaydijev | Milena Yankova | Svetla Boytcheva | Irena Vitanova | Preslav Nakov
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning

1996

pdf bib
NL Domain Explanations in Knowledge Based MAT
Galia Angelova | Kalina Bontcheva
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1990

pdf bib
MORPHO-ASSISTANT: The Proper Treatment of Morphological Knowledge
Kiril Simov | Galia Angelova | Elena Paskaleva
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics

1982

pdf bib
On an Approach for Designing Linguistic Processors
Radoslav Pavlov | Galia Angelova
Coling 1982 Abstracts: Proceedings of the Ninth International Conference on Computational Linguistics Abstracts