Milica Ikonić Nešić

Also published as: Milica Ikonić Nešić

2025

From Zero to Hero: Building Serbian NER from Rules to LLMs
Milica Ikonić Nešić | Sasa Petalinkar | Ranka Stanković | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

Named Entity Recognition (NER) presents specific challenges in Serbian, a morphologically rich language. To address these challenges, a comparative evaluation of distinct model paradigms across diverse text genres was conducted. A rule-based system (SrpNER), a traditional deep learning model (Convolutional Neural Network – CNN), fine-tuned transformer architectures (Jerteh and Tesla), and Large Language Models (LLMs), specifically ChatGPT 4.0 Nano and 4.1 Mini, were evaluated and compared. For the LLMs, a one-shot prompt engineering approach was employed, using prompt instructions aligned with the entity type definitions used in the manual annotation guidelines. Evaluation was performed on three Serbian datasets representing varied domains: newspaper articles, history textbook excerpts, and a sample of literary texts from the srpELTeC collection. The highest performance was consistently achieved by the fine-tuned transformer models, with F1 scores ranging from 0.78 on newspaper articles to 0.96 on primary school history textbook sample.

2024

pdf bib abs

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few–Shot Learning Approach Using the Mistral Model
Milica Ikonić Nešić | Saša Petalinkar | Mihailo Škorić | Ranka Stanković | Biljana Rujević
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

This study presents the Sentiment Analysis of the Serbian old novels from the 1840-1920 period, employing the Mistral Large Language Model (LLM) to pioneer zero and few-shot learning techniques. The main approach innovates by devising research prompts that include guidance text for zero-shot classification and examples for few-shot learning, enabling the LLM to classify sentiments into positive, negative, or objective categories. This methodology aims to streamline sentiment analysis by limiting responses, thereby enhancing classification precision. Python, along with the Hugging Face Transformers and LangChain libraries, serves as our technological backbone, facilitating the creation and refinement of research prompts tailored for sentence-level sentiment analysis. The results of sentiment analysis in both scenarios, zero-shot and few-shot, have indicated that the zero-shot approach outperforms, achieving an accuracy of 68.2%.

pdf bib abs

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
Ranka Stanković | Milica Ikonić Nešić | Olja Perisic | Mihailo Škorić | Olivera Kitanović
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024

The paper presents the results of the research related to the preparation of parallel corpora, focusing on transformation into RDF graphs using NLP Interchange Format (NIF) for linguistic annotation. We give an overview of the parallel corpus that was used in this case study, as well as the process of POS tagging, lemmatization, named entity recognition (NER), and named entity linking (NEL), which is implemented using Wikidata. In the first phase of NEL main characters and places mentioned in novels are stored in Wikidata and in the second phase they are linked with the occurrences of previously annotated entities in text. Next, we describe the named entity linking (NEL), data conversion to RDF, and incorporation of NIF annotations. Produced NIF files were evaluated through the exploration of triplestore using SPARQL queries. Finally, the bridging of Linked Data and Digital Humanities research is discussed, as well as some drawbacks related to the verbosity of transformation. Semantic interoperability concept in the context of linked data and parallel corpora ensures that data exchanged between systems carries shared and well-defined meanings, enabling effective communication and understanding.

2022

pdf bib abs

Sentiment Analysis of Serbian Old Novels
Ranka Stanković | Miloš Košprdić | Milica Ikonić Nešić | Tijana Radović
Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data

In this paper we present first study of Sentiment Analysis (SA) of Serbian novels from the 1840-1920 period. The preparation of sentiment lexicon was based on three existing lexicons: NRC, AFFIN and Bing with additional extensive corrections. The first phase of dataset refinement included filtering the word that are not found in Serbian morphological dictionary and in second automatic POS tagging and lemma were manually corrected. The polarity lexicon was extracted and transformed into ontolex-lemon and published as initial version. The complex inflection system of Serbian language required expansion of sentiment lexicon with inflected forms from Serbian morphological dictionaries. Set of sentences for SA was extracted from 120 novels of Serbian part of ELTeC collection, labelled for polarity and used for several model training. Several approaches for SA are compared, starting with for variation of lexicon based and followed by Logistic Regression, Naive Bayes, Decision Tree, Random Forest, SVN and k-NN. The comparison with models trained on labelled movie reviews dataset indicates that it can not successfully be used for sentiment analysis of sentences in old novels.

pdf bib abs

From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)
Milica Ikonić Nešić | Ranka Stanković | Christof Schöch | Mihailo Skoric
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

In this paper we present the wikification of the ELTeC (European Literary Text Collection), developed within the COST Action “Distant Reading for European Literary History” (CA16204). ELTeC is a multilingual corpus of novels written in the time period 1840—1920, built to apply distant reading methods and tools to explore the European literary history. We present the pipeline that led to the production of the linked dataset, the novels’ metadata retrieval and named entity recognition, transformation, mapping and Wikidata population, followed by named entity linking and export to NIF (NLP Interchange Format). The speeding up of the process of data preparation and import to Wikidata is presented on the use case of seven sub-collections of ELTeC (English, Portuguese, French, Slovenian, German, Hungarian and Serbian). Our goal was to automate the process of preparing and importing information, so OpenRefine and QuickStatements were chosen as the best options. The paper also includes examples of SPARQL queries for retrieval of authors, novel titles, publication places and other metadata with different visualisation options as well as statistical overviews.

pdf bib abs

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection
Ranka Stanković | Cvetana Krstev | Branislava Šandrih Todorović | Dusko Vitas | Mihailo Skoric | Milica Ikonić Nešić
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published on different platforms in order to make it freely available to various users. Several use examples show that this sub-collection is usefull for both close and distant reading approaches.

2021

pdf bib abs

Serbian NER&Beyond: The Archaic and the Modern Intertwinned
Branislava Šandrih Todorović | Cvetana Krstev | Ranka Stanković | Milica Ikonić Nešić
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In this work, we present a Serbian literary corpus that is being developed under the umbrella of the “Distant Reading for European Literary History” COST Action CA16204. Using this corpus of novels written more than a century ago, we have developed and made publicly available a Named Entity Recognizer (NER) trained to recognize 7 different named entity types, with a Convolutional Neural Network (CNN) architecture, having F1 score of ≈91% on the test dataset. This model has been further assessed on a separate evaluation dataset. We wrap up with comparison of the developed model with the existing one, followed by a discussion of pros and cons of the both models.

Co-authors

Venues

RANLP1

SALLD1

Fix author