Miroslav Blšták


2024

pdf bib
ChatGPT as Your n-th Annotator: Experiments in Leveraging Large Language Models for Social Science Text Annotation in Slovak Language
Endre Hamerlik | Marek Šuppa | Miroslav Blšták | Jozef Kubík | Martin Takáč | Marián Šimko | Andrej Findor
Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers

Large Language Models (LLMs) are increasingly influential in Computational Social Science, offering new methods for processing and analyzing data, particularly in lower-resource language contexts. This study explores the use of OpenAI’s GPT-3.5 Turbo and GPT-4 for automating annotations for a unique news media dataset in a lower resourced language, focusing on stance classification tasks. Our results reveal that prompting in the native language, explanation generation, and advanced prompting strategies like Retrieval Augmented Generation and Chain of Thought prompting enhance LLM performance, particularly noting GPT-4’s superiority in predicting stance. Further evaluation indicates that LLMs can serve as a useful tool for social science text annotation in lower resourced languages, notably in identifying inconsistencies in annotation guidelines and annotated datasets.

2022

pdf bib
SlovakBERT: Slovak Masked Language Model
Matúš Pikuliak | Štefan Grivalský | Martin Konôpka | Miroslav Blšták | Martin Tamajka | Viktor Bachratý | Marian Simko | Pavol Balážik | Michal Trnka | Filip Uhlárik
Findings of the Association for Computational Linguistics: EMNLP 2022

We introduce a new Slovak masked language model called SlovakBERT. This is to our best knowledge the first paper discussing Slovak transformers-based language models. We evaluate our model on several NLP tasks and achieve state-of-the-art results. This evaluation is likewise the first attempt to establish a benchmark for Slovak language models. We publish the masked language model, as well as the fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity.