Stanislava Kancheva
2012
A Treebank-driven Creation of an OntoValence Verb lexicon for Bulgarian
Petya Osenova
|
Kiril Simov
|
Laska Laskova
|
Stanislava Kancheva
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The paper presents a treebank-driven approach to the construction of a Bulgarian valence lexicon with ontological restrictions over the inner participants of the event. First, the underlying ideas behind the Bulgarian Ontology-based lexicon are outlined. Then, the extraction and manipulation of the valence frames is discussed with respect to the BulTreeBank annotation scheme and DOLCE ontology. Also, the most frequent types of syntactic frames are specified as well as the most frequent types of ontological restrictions over the verb arguments. The envisaged application of such a lexicon would be: in assigning ontological labels to syntactically parsed corpora, and expanding the lexicon and lexical information in the Bulgarian Resource Grammar.
Linguistic Analysis Processing Line for Bulgarian
Aleksandar Savkov
|
Laska Laskova
|
Stanislava Kancheva
|
Petya Osenova
|
Kiril Simov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper presents a linguistic processing pipeline for Bulgarian including morphological analysis, lemmatization and syntactic analysis of Bulgarian texts. The morphological analysis is performed by three modules ― two statistical-based and one rule-based. The combination of these modules achieves the best result for morphological tagging of Bulgarian over a rich tagset (680 tags). The lemmatization is based on rules, generated from a large morphological lexicon of Bulgarian. The syntactic analysis is implemented via MaltParser. The two statistical morphological taggers and MaltParser are trained on datasets constructed within BulTreeBank project. The processing pipeline includes also a sentence splitter and a tokenizer. All tools in the pipeline are packed in modules that can also perform separately. The whole pipeline is designed to be able to serve as a back-end of a web service oriented interface, but it also supports the user tasks with a command-line interface. The processing pipeline is compatible with the Text Corpus Format, which allows it to delegate the management of the components to the WebLicht platform.
2011
Bulgarian-English Parallel Treebank: Word and Semantic Level Alignment
Kiril Simov
|
Petya Osenova
|
Laska Laskova
|
Aleksandar Savkov
|
Stanislava Kancheva
Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora
Search