Announcing Prague Czech-English Dependency Treebank 2.0
Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Ondřej Bojar, Silvie Cinková, Eva Fučíková, Marie Mikulová, Petr Pajas, Jan Popelka, Jiří Semecký, Jana Šindlerová, Jan Štěpánek, Josef Toman, Zdeňka Urešová, Zdeněk Žabokrtský
Abstract
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.- Anthology ID:
- L12-1280
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3153–3160
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/510_Paper.pdf
- DOI:
- Bibkey:
- Cite (ACL):
- Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Ondřej Bojar, Silvie Cinková, Eva Fučíková, Marie Mikulová, Petr Pajas, Jan Popelka, Jiří Semecký, Jana Šindlerová, Jan Štěpánek, Josef Toman, Zdeňka Urešová, and Zdeněk Žabokrtský. 2012. Announcing Prague Czech-English Dependency Treebank 2.0. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3153–3160, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Announcing Prague Czech-English Dependency Treebank 2.0 (Hajič et al., LREC 2012)
- Copy Citation:
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/510_Paper.pdf
Export citation
@inproceedings{hajic-etal-2012-announcing, title = "Announcing {P}rague {C}zech-{E}nglish {D}ependency {T}reebank 2.0", author = "Haji{\v{c}}, Jan and Haji{\v{c}}ov{\'a}, Eva and Panevov{\'a}, Jarmila and Sgall, Petr and Bojar, Ond{\v{r}}ej and Cinkov{\'a}, Silvie and Fu{\v{c}}{\'\i}kov{\'a}, Eva and Mikulov{\'a}, Marie and Pajas, Petr and Popelka, Jan and Semeck{\'y}, Ji{\v{r}}{\'\i} and {\v{S}}indlerov{\'a}, Jana and {\v{S}}t{\v{e}}p{\'a}nek, Jan and Toman, Josef and Ure{\v{s}}ov{\'a}, Zde{\v{n}}ka and {\v{Z}}abokrtsk{\'y}, Zden{\v{e}}k", editor = "Calzolari, Nicoletta and Choukri, Khalid and Declerck, Thierry and Do{\u{g}}an, Mehmet U{\u{g}}ur and Maegaard, Bente and Mariani, Joseph and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios", booktitle = "Proceedings of the Eighth International Conference on Language Resources and Evaluation ({LREC}'12)", month = may, year = "2012", address = "Istanbul, Turkey", publisher = "European Language Resources Association (ELRA)", url = "http://www.lrec-conf.org/proceedings/lrec2012/pdf/510_Paper.pdf", pages = "3153--3160", abstract = "We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="hajic-etal-2012-announcing"> <titleInfo> <title>Announcing Prague Czech-English Dependency Treebank 2.0</title> </titleInfo> <name type="personal"> <namePart type="given">Jan</namePart> <namePart type="family">Hajič</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Eva</namePart> <namePart type="family">Hajičová</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jarmila</namePart> <namePart type="family">Panevová</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Petr</namePart> <namePart type="family">Sgall</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ondřej</namePart> <namePart type="family">Bojar</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Silvie</namePart> <namePart type="family">Cinková</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Eva</namePart> <namePart type="family">Fučíková</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marie</namePart> <namePart type="family">Mikulová</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Petr</namePart> <namePart type="family">Pajas</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jan</namePart> <namePart type="family">Popelka</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jiří</namePart> <namePart type="family">Semecký</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jana</namePart> <namePart type="family">Šindlerová</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jan</namePart> <namePart type="family">Štěpánek</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Josef</namePart> <namePart type="family">Toman</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zdeňka</namePart> <namePart type="family">Urešová</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zdeněk</namePart> <namePart type="family">Žabokrtský</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2012-05</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12)</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Khalid</namePart> <namePart type="family">Choukri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Thierry</namePart> <namePart type="family">Declerck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mehmet</namePart> <namePart type="given">Uğur</namePart> <namePart type="family">Doğan</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bente</namePart> <namePart type="family">Maegaard</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Joseph</namePart> <namePart type="family">Mariani</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Asuncion</namePart> <namePart type="family">Moreno</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jan</namePart> <namePart type="family">Odijk</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Stelios</namePart> <namePart type="family">Piperidis</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>European Language Resources Association (ELRA)</publisher> <place> <placeTerm type="text">Istanbul, Turkey</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.</abstract> <identifier type="citekey">hajic-etal-2012-announcing</identifier> <location> <url>http://www.lrec-conf.org/proceedings/lrec2012/pdf/510_Paper.pdf</url> </location> <part> <date>2012-05</date> <extent unit="page"> <start>3153</start> <end>3160</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Announcing Prague Czech-English Dependency Treebank 2.0 %A Hajič, Jan %A Hajičová, Eva %A Panevová, Jarmila %A Sgall, Petr %A Bojar, Ondřej %A Cinková, Silvie %A Fučíková, Eva %A Mikulová, Marie %A Pajas, Petr %A Popelka, Jan %A Semecký, Jiří %A Šindlerová, Jana %A Štěpánek, Jan %A Toman, Josef %A Urešová, Zdeňka %A Žabokrtský, Zdeněk %Y Calzolari, Nicoletta %Y Choukri, Khalid %Y Declerck, Thierry %Y Doğan, Mehmet Uğur %Y Maegaard, Bente %Y Mariani, Joseph %Y Moreno, Asuncion %Y Odijk, Jan %Y Piperidis, Stelios %S Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) %D 2012 %8 May %I European Language Resources Association (ELRA) %C Istanbul, Turkey %F hajic-etal-2012-announcing %X We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference. %U http://www.lrec-conf.org/proceedings/lrec2012/pdf/510_Paper.pdf %P 3153-3160
Markdown (Informal)
[Announcing Prague Czech-English Dependency Treebank 2.0](http://www.lrec-conf.org/proceedings/lrec2012/pdf/510_Paper.pdf) (Hajič et al., LREC 2012)
- Announcing Prague Czech-English Dependency Treebank 2.0 (Hajič et al., LREC 2012)
ACL
- Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Ondřej Bojar, Silvie Cinková, Eva Fučíková, Marie Mikulová, Petr Pajas, Jan Popelka, Jiří Semecký, Jana Šindlerová, Jan Štěpánek, Josef Toman, Zdeňka Urešová, and Zdeněk Žabokrtský. 2012. Announcing Prague Czech-English Dependency Treebank 2.0. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3153–3160, Istanbul, Turkey. European Language Resources Association (ELRA).