Bich-Ngoc Do

2020

Parsers Know Best: German PP Attachment Revisited
Bich-Ngoc Do | Ines Rehbein
Proceedings of the 28th International Conference on Computational Linguistics

In the paper, we revisit the PP attachment problem which has been identified as one of the major sources for parser errors and discuss shortcomings of recent work. In particular, we show that using gold information for the extraction of attachment candidates as well as a missing comparison of the system’s output to the output of a full syntactic parser leads to an overly optimistic assessment of the results. We address these issues by presenting a realistic evaluation of the potential of different PP attachment systems, using fully predicted information as system input. We compare our results against the output of a strong neural parser and show that the full parsing approach is superior to modeling PP attachment disambiguation as a separate task.

pdf bib abs

Evaluating a Dependency Parser on DeReKo
Peter Fankhauser | Bich-Ngoc Do | Marc Kupietz
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora

We evaluate a graph-based dependency parser on DeReKo, a large corpus of contemporary German. The dependency parser is trained on the German dataset from the SPMRL 2014 Shared Task which contains text from the news domain, whereas DeReKo also covers other domains including fiction, science, and technology. To avoid the need for costly manual annotation of the corpus, we use the parser’s probability estimates for unlabeled and labeled attachment as main evaluation criterion. We show that these probability estimates are highly correlated with the actual attachment scores on a manually annotated test set. On this basis, we compare estimated parsing scores for the individual domains in DeReKo, and show that the scores decrease with increasing distance of a domain to the training corpus.

pdf bib abs

Neural Reranking for Dependency Parsing: An Evaluation
Bich-Ngoc Do | Ines Rehbein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent work has shown that neural rerankers can improve results for dependency parsing over the top k trees produced by a base parser. However, all neural rerankers so far have been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. In the paper, we re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). We show that the GCN not only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. We explain the differences in reranking performance based on an analysis of a) the gold tree ratio and b) the variety in the k-best lists.

2019

pdf bib

tweeDe – A Universal Dependencies treebank for German tweets
Ines Rehbein | Josef Ruppenhofer | Bich-Ngoc Do
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

2017

pdf bib

Universal Dependencies are Hard to Parse – or are They?
Ines Rehbein | Julius Steen | Bich-Ngoc Do | Anette Frank
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib abs

What do we need to know about an unknown word when parsing German
Bich-Ngoc Do | Ines Rehbein | Anette Frank
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information.

pdf bib abs

Evaluating LSTM models for grammatical function labelling
Bich-Ngoc Do | Ines Rehbein
Proceedings of the 15th International Conference on Parsing Technologies

To improve grammatical function labelling for German, we augment the labelling component of a neural dependency parser with a decision history. We present different ways to encode the history, using different LSTM architectures, and show that our models yield significant improvements, resulting in a LAS for German that is close to the best result from the SPMRL 2014 shared task (without the reranker).

Co-authors

Julius Steen 1

Venues

SCLeM1

SyntaxFest1

TLT1

Fix author