Tadeusz Piotrowski


pdf bib
Lexicalised and non-lexicalized multi-word expressions in WordNet: a cross-encoder approach
Marek Maziarz | Łukasz Grabowski | Tadeusz Piotrowski | Ewa Rudnicka | Maciej Piasecki
Proceedings of the 12th Global Wordnet Conference

Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexicalised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to improve our earlier method of distinguishing between lexicalised and non-lexicalised MWEs found in WordNet using custom-designed rule-based and statistical approaches. We achieve F1-measure for the class of lexicalised word combinations close to 80%, easily beating two baselines (random and a majority class one). Language model also proves to be better than a feature-based logistic regression model.


pdf bib
Lexical Perspective on Wordnet to Wordnet Mapping
Ewa Rudnicka | Francis Bond | Łukasz Grabowski | Maciej Piasecki | Tadeusz Piotrowski
Proceedings of the 9th Global Wordnet Conference

The paper presents a feature-based model of equivalence targeted at (manual) sense linking between Princeton WordNet and plWordNet. The model incorporates insights from lexicographic and translation theories on bilingual equivalence and draws on the results of earlier synset-level mapping of nouns between Princeton WordNet and plWordNet. It takes into account all basic aspects of language such as form, meaning and function and supplements them with (parallel) corpus frequency and translatability. Three types of equivalence are distinguished, namely strong, regular and weak depending on the conformity with the proposed features. The presented solutions are language-neutral and they can be easily applied to language pairs other than Polish and English. Sense-level mapping is a more fine-grained mapping than the existing synset mappings and is thus of great potential to human and machine translation.