2020
pdf
bib
abs
The Ontology of Bulgarian Dialects – Architecture and Information Retrieval
Rositsa Dekova
Proceedings of the Twelfth Language Resources and Evaluation Conference
Following a concise description of the structure, the paper focuses on the potential of the Ontology of the Bulgarian Dialects, which demonstrates a novel usage of the ontological modelling for the purposes of dialect digital archiving and information processing. The ontology incorporates information on the dialects of the Bulgarian language and includes data from 84 dialects, spoken not only on the territory of the Republic of Bulgaria, but also abroad. It encodes both their geographical distribution and some of their main diagnostic features, such as the different mutations (also referred to as reflexes) of some of the Old Bulgarian vowels. The mutations modelled so far in the ontology include the reflex of the back nasal vowel /ѫ/ under stress, the reflex of the back er vowel /ъ/ under stress, and the reflex of the yat vowel /ѣ/ under stress when it precedes a syllable with a back vowel. Besides the opportunity for formal structuring of the considerable amount of data gathered through the years by dialectologists, the ontology also provides numerous possibilities for information retrieval – searches by dialect, country, dialect region, city or village, various combinations of diagnostic features.
2018
pdf
bib
abs
Introducing Computational Linguistics and NLP to High School Students
Rositsa Dekova
|
Adelina Radeva
Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018)
The paper addresses a possible way of introducing core concepts of Computational Linguistics through problems given at the linguistic contests organized for high school students in Bulgaria and abroad. Following a brief presentation of the foundation and the underlying objective of these contests, we outline some of the types of problems as reflecting the different levels of language processing and the diversity of approaches and tasks to be solved. By presenting the variety of problems given so far through the years, we would like to attract the attention of the academic community to this captivating method through which high school students might be acquainted with the challenges and the main goals of Computational Linguistics (CL) and Natural Language Processing (NLP).
2016
pdf
bib
abs
Stress Patterns of Compounds and MWEs in English and Bulgarian
Bistra Popovska
|
Rositsa Dekova
Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016)
The paper presents an ongoing research on the stress patterns of compounds and MWEs of the type ADJ+N and their corresponding free NPs in English and Bulgarian. The research focuses on the identification and the formal representation of the possible stress patterns of compounds and MWEs and free NPs. During our research so far, we have compiled a corpus of over 2000 compounds and MWEs, approx. 1000 for each language – English and Bulgarian. Our theoretical framework includes elements from different theories, i.e. the Generative Phonology Theory, the Metrical Theory, and the Theory of Primary accent first which all define the stress as a prosodic element. Our main goals are to specify the prosodic region where the stress is defined in English and Bulgarian MWEs and noun phrases and to define the main features of the stress in MWEs and free NPs in English and Bulgarian. The results of our research can serve for implementation into NLP modules for spoken language processing and generation.
2014
pdf
bib
abs
Electronic Language Resources in Teaching Mathematical Linguistics
Ivan Derzhanski
|
Rositsa Dekova
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)
The central role of electronic language resources in education is widely recognised (cf. Brinkley et al, 1999; Bennett, 2010; Derzhanski et al., 2007, among others). The variety and ease of access of such resources predetermines their extensive use in both research and education. With regard to teaching mathematical linguistics, electronic dictionaries and annotated corpora play a particularly important part, being an essential source of information for composing linguistic problems and presenting linguistic knowledge. This paper discusses the need for electronic resources, especially for less studied or low-resource languages, their creation and various uses in teaching linguistics to secondary school students, with examples mostly drawn from our practical work.
pdf
bib
Anaphora – Clause Annotation and Alignment Tool.
Borislav Rizov
|
Rositsa Dekova
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
2012
pdf
bib
Application of Clause Alignment for Statistical Machine Translation
Svetla Koeva
|
Svetlozara Leseva
|
Ivelina Stoyanova
|
Rositsa Dekova
|
Angel Genov
|
Borislav Rizov
|
Tsvetana Dimitrova
|
Ekaterina Tarpomanova
|
Hristina Kukova
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
pdf
bib
abs
Bulgarian X-language Parallel Corpus
Svetla Koeva
|
Ivelina Stoyanova
|
Rositsa Dekova
|
Borislav Rizov
|
Angel Genov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling ― general and focused). Certain levels of annotation applied to Bul-X-Cor are taken as obligatory (sentence segmentation and sentence alignment), while others depend on the availability of tools for a particular language (morpho-syntactic tagging, lemmatisation, syntactic parsing, named entity recognition, word sense disambiguation, etc.) or for a particular task (word and clause alignment). To achieve uniformity of the annotation we have either annotated raw data from scratch or transformed the already existing annotation to follow the conventions accepted for BulNC. Finally, actual uses of the corpora are presented and conclusions are drawn with respect to future work.