Martin Forst

Filling Statistics with Linguistics – Property Design for the Disambiguation of German LFG Parses
Martin Forst
ACL 2007 Workshop on Deep Linguistic Processing

pdf bib

Stochastic Realisation Ranking for a Free Word Order Language
Aoife Cahill | Martin Forst | Christian Rohrer
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

2006

pdf bib abs

Improving coverage and parsing quality of a large-scale LFG for German
Christian Rohrer | Martin Forst
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe experiments in parsing the German TIGER Treebank. In parsing the complete treebank, 86.44% of the sentences receive full parses; 13.56% receive fragment parses. We discuss the methods used to enhance coverage and parsing quality and we present an evaluation on a gold standard, to our knowledge the first one for a deep grammar of German. Considering the selection performed by our current version of a stochastic disambiguation component, we achieve an f-score of 84.2%, the upper and lower bounds being 87.4% and 82.3% respectively.

pdf bib abs

The importance of precise tokenizing for deep grammars
Martin Forst | Ronald M. Kaplan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present a non-deterministic finite-state transducer that acts as a tokenizer and normalizer for free text that is input to a broad-coverage LFG of German. We compare the basic tokenizer used in an earlier version of the grammar and the more sophisticated tokenizer that we now use. The revised tokenizer increases the coverage of the grammar in terms of full parses from 68.3% to 73.4% on sentences 8,001 through 10,000 of the TiGer Corpus.