Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format

Alina Wróblewska


Abstract
The paper presents the largest Polish Dependency Bank in Universal Dependencies format – PDBUD – with 22K trees and 352K tokens. PDBUD builds on its previous version, i.e. the Polish UD treebank (PL-SZ), and contains all 8K PL-SZ trees. The PL-SZ trees are checked and possibly corrected in the current edition of PDBUD. Further 14K trees are automatically converted from a new version of Polish Dependency Bank. The PDBUD trees are expanded with the enhanced edges encoding the shared dependents and the shared governors of the coordinated conjuncts and with the semantic roles of some dependents. The conducted evaluation experiments show that PDBUD is large enough for training a high-quality graph-based dependency parser for Polish.
Anthology ID:
W18-6020
Volume:
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
Month:
November
Year:
2018
Address:
Brussels, Belgium
Editors:
Marie-Catherine de Marneffe, Teresa Lynn, Sebastian Schuster
Venue:
UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
173–182
Language:
URL:
https://aclanthology.org/W18-6020
DOI:
10.18653/v1/W18-6020
Bibkey:
Cite (ACL):
Alina Wróblewska. 2018. Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 173–182, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format (Wróblewska, UDW 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6020.pdf