Václav Novák


English-Czech MT in 2008
Ondřej Bojar | David Mareček | Václav Novák | Martin Popel | Jan Ptáček | Jan Rouš | Zdeněk Žabokrtský
Proceedings of the Fourth Workshop on Statistical Machine Translation

Large-scale Semantic Networks: Annotation and Evaluation
Václav Novák | Sven Hartrumpf | Keith Hall
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

Unsupervised Detection of Annotation Inconsistencies Using Apriori Algorithm
Václav Novák | Magda Razímová
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

Comparison of Classification and Ranking Approaches to Pronominal Anaphora Resolution in Czech
Giang Linh Ngụy | Václav Novák | Zdeněk Žabokrtský
Proceedings of the SIGDIAL 2009 Conference


Automatic alignment of Czech and English deep syntactic dependency trees
David Mareček | Zdeněk Žabokrtský | Václav Novák
Proceedings of the 12th Annual conference of the European Association for Machine Translation

Inter-sentential Coreferences in Semantic Networks: An Evaluation of Manual Annotation
Václav Novák | Keith Hall
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present an evaluation of inter-sentential coreference annotation in the context of manually created semantic networks. The semantic networks are constructed independently be each annotator and require an entity mapping priori to evaluating the coreference. We introduce a model used for mapping the semantic entities as well as an algorithm used for our evaluation task. Finally, we report the raw statistics for inter-annotator agreement and describe the inherent difficulty in evaluating coreference in semantic networks.


Cedit – Semantic Networks Manual Annotation Tool
Václav Novák
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)


Perspectives of Turning Prague Dependency Treebank into a Knowledge Base
Václav Novák | Jan Hajič
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Recently, the Prague Dependency Treebank 2.0 (PDT 2.0) has emerged as the largest text corpora annotated on the level of tectogrammatical representation (“linguistic meaning”) described in Sgall et al. (2004) and containing about 0.8 milion words (see Hajic (2004)). We hope that this level of annotation is so close to the meaning of the utterances contained in the corpora that it should enable us to automatically transform texts contained in the corpora to the form of knowledge base, usable for information extraction, question answering, summarization, etc. We can use Multilayered Extended Semantic Networks (MultiNet) described in Helbig (2006) as the target formalism. In this paper we discuss the suitability of such approach and some of the main issues that will arise in the process. In section 1, we introduce formalisms underlying PDT 2.0 and MultiNet, in section 2. We describe the role MultiNet can play in the system of Functional Generative Description (FGD), section 3 discusses issues of automatic conversion to MultiNet and section 4 gives some conclusions.

On Distance between Deep Syntax and Semantic Representation
Václav Novák
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006


Corrective Modeling for Non-Projective Dependency Parsing
Keith Hall | Václav Novák
Proceedings of the Ninth International Workshop on Parsing Technology