Nikolay Paev


2025

pdf bib
Word Sense Disambiguation with Large Language Models: Casing Bulgarian
Nikolay Paev | Kiril Simov | Petya Osenova
Proceedings of the 13th Global Wordnet Conference

pdf bib
Bulgarian Event Extraction with LLMs
Kiril Simov | Nikolay Paev | Petya Osenova | Stefan Marinov
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

The paper presents the results from the experiments with two large language models (LLMs) - T5 and Llama – for extracting events from a Bulgarian event corpus. The two models were pretrained by us on 35 Billion Token Bulgarian Corpus. The extraction was performed within the context of one sentence. Our approach aims at balancing the ACE-oriented approach that uses triggers in event detection, and the MUC-oriented one that uses more general event types. The evaluation relies on the IoU (Intersection over Union) of token spans and is twofold. The first one refers to the predicted event token span. Here if the span is correct, the semantic roles within the event are further checked. The second one refers to the triple of an event type, its semantic roles and participants. The results are promising. A qualitative evaluation is provided as well.

pdf bib
Visualization of LLM Annotated Documents
Teodor Todorov Valtchev | Nikolay Paev
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

The paper presents an automatic annotation and visualization system for documents in the field of Social Sciences and Humanities. The annotation is on two levels, named Entities and Events. The system combines automatically generated annotations from language models with a powerful text editor that is extended to accommodate manual annotation. The goal is to support the extraction of information from historical documents by scientists in the SS&H field. At the time of writing of the paper, the system is still in development.

2024

pdf bib
Introducing Shallow Syntactic Information within the Graph-based Dependency Parsing
Nikolay Paev | Kiril Simov | Petya Osenova
Proceedings of the 22nd Workshop on Treebanks and Linguistic Theories (TLT 2024)

The paper presents a new BERT model, fine-tuned for parsing of Bulgarian texts. This model is extended with a new neural network layer in order to incorporate shallow syntactic information during the training phase. The results show statistically significant improvement over the baseline. Thus, the addition of syntactic knowledge - even partial - makes the model better. Also, some error analysis has been conducted on the results from the parsers. Although the architecture has been designed and tested for Bulgarian, it is also scalable for other languages. This scalability was shown here with some experiments and evaluation on an English treebank with a comparable size.