Evaluating Hallucinations in Large Language Models for Bulgarian Language
Proceedings of the 8th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
In this short paper, we introduce the task of evaluating the hallucination of large language models for the Bulgarian language. We first give definitions of what is a hallucination in large language models and what evaluation methods for measuring hallucinations exist. Next, we give an overview of the multilingual evaluation of the latest large language models, focusing on the evaluation of the performance in Bulgarian on tasks, related to hallucination. We then present a method to evaluate the level of hallucination in a given language with no reference data, and provide some initial experiments with this method in Bulgarian. Finally, we provide directions for future research on the topic.
The Bulgarian Event Corpus: Overview and Initial NER Experiments
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The paper describes the Bulgarian Event Corpus (BEC). The annotation scheme is based on CIDOC-CRM ontology and on the English Framenet, adjusted for our task. It includes two main layers: named entities and events with their roles. The corpus is multi-domain and mainly oriented towards Social Sciences and Humanities (SSH). It will be used for: extracting knowledge and making it available through the Bulgaria-centric Knowledge Graph; further developing an annotation scheme that handles multiple domains in SSH; training automatic modules for the most important knowledge-based tasks, such as domain-specific and nested NER, NEL, event detection and profiling. Initial experiments were conducted on standard NER task due to complexity of the dataset and the rich NE annotation scheme. The results are promising with respect to some labels and give insights on handling better other ones. These experiments serve also as error detection modules that would help us in scheme re-design. They are a basis for further and more complex tasks, such as nested NER, NEL and event detection.
Overview on NLP Techniques for Content-based Recommender Systems for Books
Proceedings of the Student Research Workshop Associated with RANLP 2019
Recommender systems are an essential part of today’s largest websites. Without them, it would be hard for users to find the right products and content. One of the most popular methods for recommendations is content-based filtering. It relies on analysing product metadata, a great part of which is textual data. Despite their frequent use, there is still no standard procedure for developing and evaluating content-based recommenders. In this paper, we will first examine current approaches for designing, training and evaluating recommender systems based on textual data for books recommendations for GoodReads’ website. We will give critiques on existing methods and suggest how natural language techniques can be employed for the improvement of content-based recommenders.