KPWr: Towards a Free Corpus of Polish

Bartosz Broda, Michał Marcińczuk, Marek Maziarz, Adam Radziszewski, Adam Wardyński


Abstract
This paper presents our efforts aimed at collecting and annotating a free Polish corpus. The corpus will serve for us as training and testing material for experiments with Machine Learning algorithms. As others may also benefit from the resource, we are going to release it under a Creative Commons licence, which is hoped to remove unnecessary usage restrictions, but also to facilitate reproduction of our experimental results. The corpus is being annotated with various types of linguistic entities: chunks and named entities, selected syntactic and semantic relations, word senses and anaphora. We report on the current state of the project as well as our ultimate goals.
Anthology ID:
L12-1574
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3218–3222
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/965_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Bartosz Broda, Michał Marcińczuk, Marek Maziarz, Adam Radziszewski, and Adam Wardyński. 2012. KPWr: Towards a Free Corpus of Polish. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3218–3222, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
KPWr: Towards a Free Corpus of Polish (Broda et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/965_Paper.pdf