Stefan Marinov
2025
Bulgarian Event Extraction with LLMs
Kiril Simov
|
Nikolay Paev
|
Petya Osenova
|
Stefan Marinov
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
The paper presents the results from the experiments with two large language models (LLMs) - T5 and Llama – for extracting events from a Bulgarian event corpus. The two models were pretrained by us on 35 Billion Token Bulgarian Corpus. The extraction was performed within the context of one sentence. Our approach aims at balancing the ACE-oriented approach that uses triggers in event detection, and the MUC-oriented one that uses more general event types. The evaluation relies on the IoU (Intersection over Union) of token spans and is twofold. The first one refers to the predicted event token span. Here if the span is correct, the semantic roles within the event are further checked. The second one refers to the triple of an event type, its semantic roles and participants. The results are promising. A qualitative evaluation is provided as well.