Pavlína Synková


2024

pdf bib
Announcing the Prague Discourse Treebank 3.0
Pavlína Synková | Jiří Mírovský | Lucie Poláková | Magdaléna Rysová
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present the Prague Discourse Treebank 3.0 – a new version of the annotation of discourse relations marked by primary and secondary discourse connectives in the data of the Prague Dependency Treebank. Compared to the previous version (PDiT 2.0), the version 3.0 comes with three types of major updates: (i) it brings a largely revised annotation of discourse relations: pragmatic relations have been thoroughly reworked, many inconsistencies across all discourse types have been fixed and previously unclear cases marked in annotators’ comments have been resolved, (ii) it achieves consistency with a Lexicon of Czech Discourse Connectives (CzeDLex), and (iii) it provides the data not only in its native format (Prague Markup Language, discourse relations annotated at the top of the dependency trees), but also in the Penn Discourse Treebank 3.0 format (plain text plus a stand-off discourse annotation) and sense taxonomy. PDiT 3.0 contains 21,662 discourse relations (plus 445 list relations) in 49 thousand sentences.

pdf bib
Cost-Effective Discourse Annotation in the Prague Czech–English Dependency Treebank
Jiří Mírovský | Pavlína Synková | Lucie Polakova | Marie Paclíková
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present a cost-effective method for obtaining a high-quality annotation of explicit discourse relations in the Czech part of the Prague Czech–English Dependency Treebank, a corpus of almost 50 thousand sentences coming from the Czech translation of the Wall Street Journal part of the Penn Treebank. We use three different sources of information and combine them to obtain the discourse annotation: (i) annotation projection from the Penn Discourse Treebank 3.0, (ii) manual tectogrammatical (deep syntax) representation of sentences of the corpus, and (iii) the Lexicon of Czech Discourse Connectives CzeDLex. After solving as many discrepancies as possible automatically, the final discourse annotation is achieved by manual inspection of the remaining problematic cases. The discourse annotation of the corpus will be available both in the Prague format (on top of tectogrammatical trees) with the Prague taxonomy of discourse types, and in the Penn format (on plain texts) with the Penn Discourse Treebank 3.0 sense taxonomy.

2020

pdf bib
CzeDLex 0.6 and its Representation in the PML-TQ
Jiří Mírovský | Lucie Poláková | Pavlína Synková
Proceedings of the Twelfth Language Resources and Evaluation Conference

CzeDLex is an electronic lexicon of Czech discourse connectives with its data coming from a large treebank annotated with discourse relations. Its new version CzeDLex 0.6 (as compared with the previous version 0.5, which was published in 2017) is significantly larger with respect to manually processed entries. Also, its structure has been modified to allow for primary connectives to appear with multiple entries for a single discourse sense. The lexicon comes in several formats, being both human and machine readable, and is available for searching in PML Tree Query, a user-friendly and powerful search tool for all kinds of linguistically annotated treebanks. The main purpose of this paper/demo is to present the new version of the lexicon and to demonstrate possibilities of mining various types of information from the lexicon using PML Tree Query; we present several examples of search queries over the lexicon data along with their results. The new version of the lexicon, CzeDLex 0.6, is available on-line and was officially released in December 2019 under the Creative Commons License.

2017

pdf bib
Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus
Pavlína Synková | Magdaléna Rysová | Lucie Poláková | Jiří Mírovský
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation