Daniel Ziembicki


2024

pdf bib
Polish Discourse Corpus (PDC): Corpus Design, ISO-Compliant Annotation, Data Highlights, and Parser Development
Maciej Ogrodniczuk | Aleksandra Tomaszewska | Daniel Ziembicki | Sebastian Żurowski | Ryszard Tuora | Aleksandra Zwierzchowska
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents the Polish Discourse Corpus, a pioneering resource of this kind for Polish and the first corpus in Poland to employ the ISO standard for discourse relation annotation. The Polish Discourse Corpus adopts ISO 24617-8, a segment of the Language Resource Management – Semantic Annotation Framework (SemAF), which outlines a set of core discourse relations adaptable for diverse languages and genres. The paper overviews the corpus architecture, annotation procedures, the challenges that the annotators have encountered, as well as key statistical data concerning discourse relations and connectives in the corpus. It further discusses the initial phases of the discourse parser tailored for the ISO 24617-8 framework. Evaluations on the efficacy and potential refinement areas of the corpus annotation and parsing strategies are also presented. The final part of the paper touches upon anticipated research plans to improve discourse analysis techniques in the project and to conduct discourse studies involving multiple languages.

2023

pdf bib
Adopting ISO 24617-8 for Discourse Relations Annotation in Polish:Challenges and Future Directions
Sebastian Zurowski | Daniel Ziembicki | Aleksandra Tomaszewska | Maciej Ogrodniczuk | Agata Drozd
Proceedings of the 4th Conference on Language, Data and Knowledge

2019

pdf bib
Named Entity Recognition - Is There a Glass Ceiling?
Tomasz Stanislawek | Anna Wróblewska | Alicja Wójcicka | Daniel Ziembicki | Przemyslaw Biecek
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recent developments in Named Entity Recognition (NER) have resulted in better and better models. However, is there a glass ceiling? Do we know which types of errors are still hard or even impossible to correct? In this paper, we present a detailed analysis of the types of errors in state-of-the-art machine learning (ML) methods. Our study illustrates weak and strong points of the Stanford, CMU, FLAIR, ELMO and BERT models, as well as their shared limitations. We also introduce new techniques for improving annotation, training process, and for checking model quality and stability.