This paper presents a joint model for morphological and dependency analysis based on automatically acquired lexical knowledge. This model takes advantage of rich lexical knowledge to simultaneously resolve word segmentation, POS, and dependency ambiguities. In our experiments on Japanese, we show the effectiveness of our joint model over conventional pipeline models.
In this paper, we present an approach to improve the accuracy of a strong transition-based dependency parser by exploiting dependency language models that are extracted from a large parsed corpus. We integrated a small number of features based on the dependency language models into the parser. To demonstrate the effectiveness of the proposed approach, we evaluate our parser on standard English and Chinese data where the base parser could achieve competitive accuracy scores. Our enhanced parser achieved state-of-the-art accuracy on Chinese data and competitive results on English data. We gained a large absolute improvement of one point (UAS) on Chinese and 0.5 points for English.
We present a systematic analysis of lexicalized vs. delexicalized parsing in low-resource scenarios, and propose a methodology to choose one method over another under certain conditions. We create a set of simulation experiments on 41 languages and apply our findings to 9 low-resource languages. Experimental results show that our methodology chooses the best approach in 8 out of 9 cases.
Neural part-of-speech tagging has achieved competitive results with the incorporation of character-based and pre-trained word embeddings. In this paper, we show that a state-of-the-art bi-LSTM tagger can benefit from using information from morphosyntactic lexicons as additional input. The tagger, trained on several dozen languages, shows a consistent, average improvement when using lexical information, even when also using character-based embeddings, thus showing the complementarity of the different sources of lexical information. The improvements are particularly important for the smaller datasets.
We present a low-rank multi-linear model for the task of solving prepositional phrase attachment ambiguity (PP task). Our model exploits tensor products of word embeddings, capturing all possible conjunctions of latent embeddings. Our results on a wide range of datasets and task settings show that tensor products are the best compositional operation and that a relatively simple multi-linear model that uses only word embeddings of lexical features can outperform more complex non-linear architectures that exploit the same information. Our proposed model gives the current best reported performance on an out-of-domain evaluation and performs competively on out-of-domain dependency parsing datasets.
This opinion paper proposes the use of parallel treebank as learner corpus. We show how an L1-L2 parallel treebank — i.e., parse trees of non-native sentences, aligned to the parse trees of their target hypotheses — can facilitate retrieval of sentences with specific learner errors. We argue for its benefits, in terms of corpus re-use and interoperability, over a conventional learner corpus annotated with error tags. As a proof of concept, we conduct a case study on word-order errors made by learners of Chinese as a foreign language. We report precision and recall in retrieving a range of word-order error categories from L1-L2 tree pairs annotated in the Universal Dependency framework.
This paper applies parsing technology to the task of syntactic simplification of English sentences, focusing on the identification of text spans that can be removed from a complex sentence. We report the most comprehensive evaluation to-date on this task, using a dataset of sentences that exhibit simplification based on coordination, subordination, punctuation/parataxis, adjectival clauses, participial phrases, and appositive phrases. We train a decision tree with features derived from text span length, POS tags and dependency relations, and show that it significantly outperforms a parser-only baseline.
In applying word-based dependency parsing such as Universal Dependencies (UD) to Japanese, the uncertainty of word segmentation emerges for defining a word unit of the dependencies. We introduce the following hierarchical word structures to dependency parsing in Japanese: morphological units (a short unit word, SUW) and syntactic units (a long unit word, LUW). An SUW can be used to segment a sentence consistently, while it is too short to represent syntactic construction. An LUW is a unit including functional multiwords and LUW-based analysis facilitates the capturing of syntactic structure and makes parsing results more precise than SUW-based analysis. This paper describes the results of a feasibility study on the ability and the effectiveness of parsing methods based on hierarchical word structure (LUW chunking+parsing) in comparison to single layer word structure (SUW parsing). We also show joint analysis of LUW-chunking and dependency parsing improves the performance of identifying predicate-argument structures, while there is not much difference between overall results of them. not much difference between overall results of them.
We investigate the problem of parsing conversational data of morphologically-rich languages such as Hindi where argument scrambling occurs frequently. We evaluate a state-of-the-art non-linear transition-based parsing system on a new dataset containing 506 dependency trees for sentences from Bollywood (Hindi) movie scripts and Twitter posts of Hindi monolingual speakers. We show that a dependency parser trained on a newswire treebank is strongly biased towards the canonical structures and degrades when applied to conversational data. Inspired by Transformational Generative Grammar (Chomsky, 1965), we mitigate the sampling bias by generating all theoretically possible alternative word orders of a clause from the existing (kernel) structures in the treebank. Training our parser on canonical and transformed structures improves performance on conversational data by around 9% LAS over the baseline newswire parser.
Syntactic annotation is costly and not available for the vast majority of the world’s languages. We show that sometimes we can do away with less labeled data by exploiting more readily available forms of mark-up. Specifically, we revisit an idea from Valentin Spitkovsky’s work (2010), namely that hyperlinks typically bracket syntactic constituents or chunks. We strengthen his results by showing that not only can hyperlinks help in low resource scenarios, exemplified here by Quechua, but learning from hyperlinks can also improve state-of-the-art NLP models for English newswire. We also present out-of-domain evaluation on English Ontonotes 4.0.
PP-attachments are an important source of errors in parsing natural language. We propose in this article to use data coming from a multimodal corpus, combining textual, visual and conceptual information, as well as a correction strategy, to propose alternative attachments in the output of a parser.
Deep dependency parsing can be cast as the search for maximum acyclic subgraphs in weighted digraphs. Because this search problem is intractable in the general case, we consider its restriction to the class of 1-endpoint-crossing (1ec) graphs, which has high coverage on standard data sets. Our main contribution is a characterization of 1ec graphs as a subclass of the graphs with pagenumber at most 3. Building on this we show how to extend an existing parsing algorithm for 1-endpoint-crossing trees to the full class. While the runtime complexity of the extended algorithm is polynomial in the length of the input sentence, it features a large constant, which poses a challenge for practical implementations.
We present a new transition system with word reordering for unrestricted non-projective dependency parsing. Our system is based on decomposed arc-eager rather than arc-standard, which allows more flexible ambiguity resolution between a local projective and non-local crossing attachment. In our experiment on Universal Dependencies 2.0, we find our parser outperforms the ordinary swap-based parser particularly on languages with a large amount of non-projectivity.
In this paper, we extend the arc-hybrid system for transition-based parsing with a swap transition that enables reordering of the words and construction of non-projective trees. Although this extension breaks the arc-decomposability of the transition system, we show how the existing dynamic oracle for this system can be modified and combined with a static oracle only for the swap transition. Experiments on 5 languages show that the new system gives competitive accuracy and is significantly better than a system trained with a purely static oracle.
Encoder-decoder neural networks have been used for many NLP tasks, such as neural machine translation. They have also been applied to constituent parsing by using bracketed tree structures as a target language, translating input sentences into syntactic trees. A more commonly used method to linearize syntactic trees is the shift-reduce system, which uses a sequence of transition-actions to build trees. We empirically investigate the effectiveness of applying the encoder-decoder network to transition-based parsing. On standard benchmarks, our system gives comparable results to the stack LSTM parser for dependency parsing, and significantly better results compared to the aforementioned parser for constituent parsing, which uses bracketed tree formats.
We present a neural transition-based parser for spinal trees, a dependency representation of constituent trees. The parser uses Stack-LSTMs that compose constituent nodes with dependency-based derivations. In experiments, we show that this model adapts to different styles of dependency relations, but this choice has little effect for predicting constituent structure, suggesting that LSTMs induce useful states by themselves.
We generalize coarse-to-fine parsing to grammar formalisms that are more expressive than PCFGs and/or describe languages of trees or graphs. We evaluate our algorithm on PCFG, PTAG, and graph parsing. While we achieve the expected performance gains on PCFGs, coarse-to-fine does not help for PTAG and can even slow down parsing for graphs. We discuss the implications of this finding.
To improve grammatical function labelling for German, we augment the labelling component of a neural dependency parser with a decision history. We present different ways to encode the history, using different LSTM architectures, and show that our models yield significant improvements, resulting in a LAS for German that is close to the best result from the SPMRL 2014 shared task (without the reranker).