2013
pdf
bib
Learning to lemmatise Polish noun phrases
Adam Radziszewski
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
Evaluation of baseline information retrieval for Polish open-domain Question Answering system
Michał Marcińczuk
|
Adam Radziszewski
|
Maciej Piasecki
|
Dominik Piasecki
|
Marcin Ptak
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
2012
pdf
bib
abs
KPWr: Towards a Free Corpus of Polish
Bartosz Broda
|
Michał Marcińczuk
|
Marek Maziarz
|
Adam Radziszewski
|
Adam Wardyński
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper presents our efforts aimed at collecting and annotating a free Polish corpus. The corpus will serve for us as training and testing material for experiments with Machine Learning algorithms. As others may also benefit from the resource, we are going to release it under a Creative Commons licence, which is hoped to remove unnecessary usage restrictions, but also to facilitate reproduction of our experimental results. The corpus is being annotated with various types of linguistic entities: chunks and named entities, selected syntactic and semantic relations, word senses and anaphora. We report on the current state of the project as well as our ultimate goals.
2011
pdf
bib
abs
Maca – a configurable tool to integrate Polish morphological data
Adam Radziszewski
|
Tomasz Śniatowski
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation
There are a number of morphological analysers for Polish. Most of these, however, are non-free resources. What is more, different analysers employ different tagsets and tokenisation strategies. This situation calls for a simple and universal framework to join different sources of morphological information, including the existing resources as well as user-provided dictionaries. We present such a configurable framework that allows to write simple configuration files that define tokenisation strategies and the behaviour of morphological analysers, including simple tagset conversion.