Larisa Kolesnichenko


2023

pdf bib
Word Substitution with Masked Language Models as Data Augmentation for Sentiment Analysis
Larisa Kolesnichenko | Erik Velldal | Lilja Øvrelid
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

This paper explores the use of masked language modeling (MLM) for data augmentation (DA), targeting structured sentiment analysis (SSA) for Norwegian based on a dataset of annotated reviews. Considering the limited resources for Norwegian language and the complexity of the annotation task, the aim is to investigate whether this approach to data augmentation can help boost the performance. We report on experiments with substituting words both inside and outside of sentiment annotations, and we also present an error analysis, discussing some of the potential pitfalls of using MLM-based DA for SSA, and suggest directions for future work.