Slawomir Dadas


2023

pdf bib
OPI at SemEval-2023 Task 9: A Simple But Effective Approach to Multilingual Tweet Intimacy Analysis
Slawomir Dadas
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our submission to the SemEval 2023 multilingual tweet intimacy analysis shared task. The goal of the task was to assess the level of intimacy of Twitter posts in ten languages. The proposed approach consists of several steps. First, we perform in-domain pre-training to create a language model adapted to Twitter data. In the next step, we train an ensemble of regression models to expand the training set with pseudo-labeled examples. The extended dataset is used to train the final solution. Our method was ranked first in five out of ten language subtasks, obtaining the highest average score across all languages.

pdf bib
OPI at SemEval-2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation
Slawomir Dadas
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The goal of visual word sense disambiguation is to find the image that best matches the provided description of the word’s meaning. It is a challenging problem, requiring approaches that combine language and image understanding. In this paper, we present our submission to SemEval 2023 visual word sense disambiguation shared task. The proposed system integrates multimodal embeddings, learning to rank methods, and knowledge-based approaches. We build a classifier based on the CLIP model, whose results are enriched with additional information retrieved from Wikipedia and lexical databases. Our solution was ranked third in the multilingual task and won in the Persian track, one of the three language subtasks.

2020

pdf bib
Evaluation of Sentence Representations in Polish
Slawomir Dadas | Michał Perełkiewicz | Rafał Poświata
Proceedings of the Twelfth Language Resources and Evaluation Conference

Methods for learning sentence representations have been actively developed in recent years. However, the lack of pre-trained models and datasets annotated at the sentence level has been a problem for low-resource languages such as Polish which led to less interest in applying these methods to language-specific tasks. In this study, we introduce two new Polish datasets for evaluating sentence embeddings and provide a comprehensive evaluation of eight sentence representation methods including Polish and multilingual models. We consider classic word embedding models, recently developed contextual embeddings and multilingual sentence encoders, showing strengths and weaknesses of specific approaches. We also examine different methods of aggregating word vectors into a single sentence vector.