Wiktor Walentynowicz


2023

pdf bib
CodeNLP at SemEval-2023 Task 2: Data Augmentation for Named Entity Recognition by Combination of Sequence Generation Strategies
Micha Marcińczuk | Wiktor Walentynowicz
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In the article, we present the CodeNLP submission to the SemEval-2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition. Our approach is based on data augmentation by combining various strategies of sequence generation for training. We show that the extended procedure of fine-tuning a pre-trained language model can bring improvements compared to any single strategy. On the development subsets, the improvements were 1.7 pp and 3.1 pp of F-measure, for English and multilingual datasets, respectively. On the test subsets our models achieved 63.51% and 73.22% of Macro F1, respectively.

pdf bib
Wordnet-oriented recognition of derivational relations
Wiktor Walentynowicz | Maciej Piasecki
Proceedings of the 12th Global Wordnet Conference

Derivational relations are an important element in defining meanings, as they help to explore word-formation schemes and predict senses of derivates (derived words). In this work, we analyse different methods of representing derivational forms obtained from WordNet – from quantitative vectors to contextual learned embedding methods – and compare ways of classifying the derivational relations occurring between them. Our research focuses on the explainability of the obtained representations and results. The data source for our research is plWordNet, which is the wordnet of the Polish language and includes a rich set of derivation examples.

2022

pdf bib
Towards a contextualised spatial-diachronic history of literature: mapping emotional representations of the city and the country in Polish fiction from 1864 to 1939
Agnieszka Karlińska | Cezary Rosiński | Jan Wieczorek | Patryk Hubar | Jan Kocoń | Marek Kubis | Stanisław Woźniak | Arkadiusz Margraf | Wiktor Walentynowicz
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

In this article, we discuss the conditions surrounding the building of historical and literary corpora. We describe the assumptions and method of making the original corpus of the Polish novel (1864-1939). Then, we present the research procedure aimed at demonstrating the variability of the emotional value of the concept of “the city” and “the country” in the texts included in our corpus. The proposed method considers the complex socio-political nature of Central and Eastern Europe, especially the fact that there was no unified Polish state during this period. The method can be easily replicated in studies of the literature of countries with similar specificities.

2021

pdf bib
Enriching plWordNet with morphology
Agnieszka Dziob | Wiktor Walentynowicz
Proceedings of the 11th Global Wordnet Conference

In the paper, we present the process of adding morphological information to the Polish WordNet (plWordNet). We describe the reasons for this connection and the intuitions behind it. We also draw attention to the specificity of the Polish morphology. We show in which tasks the morphological information is important and how the methods can be developed by extending them to include combined morphological information based on WordNet.

2019

pdf bib
Tagger for Polish Computer Mediated Communication Texts
Wiktor Walentynowicz | Maciej Piasecki | Marcin Oleksy
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper we present a morpho-syntactic tagger dedicated to Computer-mediated Communication texts in Polish. Its construction is based on an expanded RNN-based neural network adapted to the work on noisy texts. Among several techniques, the tagger utilises fastText embedding vectors, sequential character embedding vectors, and Brown clustering for the coarse-grained representation of sentence structures. In addition a set of manually written rules was proposed for post-processing. The system was trained to disambiguate descriptions of words in relation to Parts of Speech tags together with the full morphological information in terms of values for the different grammatical categories. We present also evaluation of several model variants on the gold standard annotated CMC data, comparison to the state-of-the-art taggers for Polish and error analysis. The proposed tagger shows significantly better results in this domain and demonstrates the viability of adaptation.