Paweł Kędzia


2018

pdf bib
Wordnet-based Evaluation of Large Distributional Models for Polish
Maciej Piasecki | Gabriela Czachor | Arkadiusz Janz | Dominik Kaszewski | Paweł Kędzia
Proceedings of the 9th Global Wordnet Conference

The paper presents construction of large scale test datasets for word embeddings on the basis of a very large wordnet. They were next applied for evaluation of word embedding models and used to assess and compare the usefulness of different word embeddings extracted from a very large corpus of Polish. We analysed also and compared several publicly available models described in literature. In addition, several large word embeddings models built on the basis of a very large Polish corpus are presented.

2017

pdf bib
Graph-Based Approach to Recognizing CST Relations in Polish Texts
Paweł Kędzia | Maciej Piasecki | Arkadiusz Janz
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This paper presents an supervised approach to the recognition of Cross-document Structure Theory (CST) relations in Polish texts. In the proposed, graph-based representation is constructed for sentences. Graphs are built on the basis of lexicalised syntactic-semantic relation extracted from text. Similarity between sentences is calculated from graph, and the similarity values are input to classifiers trained by Logistic Model Tree. Several different configurations of graph, as well as graph similarity methods were analysed for this tasks. The approach was evaluated on a large open corpus annotated manually with 17 types of selected CST relations. The configuration of experiments was similar to those known from SEMEVAL and we obtained very promising results.

2016

pdf bib
plWordNet in Word Sense Disambiguation task
Maciej Piasecki | Paweł Kędzia | Marlena Orlińska
Proceedings of the 8th Global WordNet Conference (GWC)

The paper explores the application of plWordNet, a very large wordnet of Polish, in weakly supervised Word Sense Disambiguation (WSD). Because plWordNet provides only partial descriptions by glosses and usage examples, and does not include sense-disambiguated glosses, PageRank-based WSD methods perform slightly worse than for English. However, we show that the use of weights for the relation types and the order in which lexical units have been added for sense re-ranking can significantly improve WSD precision. The evaluation was done on two Polish corpora (KPWr and Składnica) including manual WSD. We discuss the fundamental difference in the construction of both corpora and very different test results.

pdf bib
plWordNet 3.0 – a Comprehensive Lexical-Semantic Resource
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz | Paweł Kędzia
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We have released plWordNet 3.0, a very large wordnet for Polish. In addition to what is expected in wordnets – richly interrelated synsets – it contains sentiment and emotion annotations, a large set of multi-word expressions, and a mapping onto WordNet 3.1. Part of the release is enWordNet 1.0, a substantially enlarged copy of WordNet 3.1, with material added to allow for a more complete mapping. The paper discusses the design principles of plWordNet, its content, its statistical portrait, a comparison with similar resources, and a partial list of applications.

2014

pdf bib
Ruled-based, Interlingual Motivated Mapping of plWordNet onto SUMO Ontology
Paweł Kędzia | Maciej Piasecki
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we study a rule-based approach to mapping plWordNet onto SUMO Upper Ontology on the basis of the already existing mappings: plWordNet – the Princeton WordNet – SUMO. Information acquired from the inter-lingual relations between plWordNet and Princeton WordNet and the relations between Princeton WordNet and SUMO ontology are used in the proposed rules. Several mapping rules together with the matching examples are presented. The automated mapping results were evaluated in two steps, (i) we automatically checked formal correctness of the mappings for the pairs of plWordNet synset and SUMO concept, (ii) a subset of 160 mapping examples was manually checked by two+one linguists. We analyzed types of the mapping errors and their causes. The proposed rules expressed very high precision, especially when the errors in the resources are taken into account. Because both wordnets were constructed independently and as a result the obtained rules are not trivial and they reveal the differences between both wordnets and both languages.

2013

pdf bib
Recognizing semantic relations within Polish noun phrase: A rule-based approach
Paweł Kędzia | Marek Maziarz
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013