2021
pdf
bib
abs
On Finding the Kbest Nonprojective Dependency Trees
Ran Zmigrod

Tim Vieira

Ryan Cotterell
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
The connection between the maximum spanning tree in a directed graph and the best dependency tree of a sentence has been exploited by the NLP community. However, for many dependency parsing schemes, an important detail of this approach is that the spanning tree must have exactly one edge emanating from the root. While work has been done to efficiently solve this problem for finding the onebest dependency tree, no research has attempted to extend this solution to finding the Kbest dependency trees. This is arguably a more important extension as a larger proportion of decoded trees will not be subject to the root constraint of dependency trees. Indeed, we show that the rate of root constraint violations increases by an average of 13 times when decoding with K=50 as opposed to K=1. In this paper, we provide a simplification of the Kbest spanning tree algorithm of Camerini et al. (1980). Our simplification allows us to obtain a constant time speedup over the original algorithm. Furthermore, we present a novel extension of the algorithm for decoding the Kbest dependency trees of a graph which are subject to a root constraint.
pdf
bib
abs
Higherorder Derivatives of Weighted Finitestate Machines
Ran Zmigrod

Tim Vieira

Ryan Cotterell
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Weighted finitestate machines are a fundamental building block of NLP systems. They have withstood the test of time—from their early use in noisy channel models in the 1990s up to modernday neurally parameterized conditional random fields. This work examines the computation of higherorder derivatives with respect to the normalization constant for weighted finitestate machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of secondorder derivatives, our scheme runs in the optimal O(Aˆ2 Nˆ4) time where A is the alphabet size and N is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing secondorder expectations, such as covariance matrices and gradients of firstorder expectations.
2020
pdf
bib
abs
The Universal Decompositional Semantics Dataset and Decomp Toolkit
Aaron Steven White

Elias StengelEskin

Siddharth Vashishtha

Venkata Subrahmanyan Govindarajan

Dee Ann Reisinger

Tim Vieira

Keisuke Sakaguchi

Sheng Zhang

Francis Ferraro

Rachel Rudinger

Kyle Rawlins

Benjamin Van Durme
Proceedings of the 12th Language Resources and Evaluation Conference
We present the Universal Decompositional Semantics (UDS) dataset (v1.0), which is bundled with the Decomp toolkit (v0.1). UDS1.0 unifies five highquality, decompositional semanticsaligned annotation sets within a single semantic graph specification—with graph structures defined by the predicative patterns produced by the PredPatt tool and realvalued node and edge attributes constructed using sophisticated normalization procedures. The Decomp toolkit provides a suite of Python 3 tools for querying UDS graphs using SPARQL. Both UDS1.0 and Decomp0.1 are publicly available at http://decomp.io.
pdf
bib
abs
BestFirst Beam Search
Clara Meister

Tim Vieira

Ryan Cotterell
Transactions of the Association for Computational Linguistics, Volume 8
Decoding for many NLP tasks requires an effective heuristic algorithm for approximating exact search because the problem of searching the full output space is often intractable, or impractical in many settings. The default algorithm for this job is beam search—a pruned version of breadthfirst search. Quite surprisingly, beam search often returns better results than exact inference due to beneficial search bias for NLP tasks. In this work, we show that the standard implementation of beam search can be made up to 10x faster in practice. Our method assumes that the scoring function is monotonic in the sequence length, which allows us to safely prune hypotheses that cannot be in the final set of hypotheses early on. We devise effective monotonic approximations to popular nonmonontic scoring functions, including length normalization and mutual information decoding. Lastly, we propose a memoryreduced variant of bestfirst beam search, which has a similar beneficial search bias in terms of downstream performance, but runs in a fraction of the time.
pdf
bib
abs
If beam search is the answer, what was the question?
Clara Meister

Ryan Cotterell

Tim Vieira
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural language generators frequently leads to lowquality results. Rather, most stateoftheart results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate. This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question? We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy. We find that beam search enforces uniform information density in text, a property motivated by cognitive science. We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models. Additionally, we analyze the text produced using various decoding strategies and see that, in our neural machine translation experiments, the extent to which this property is adhered to strongly correlates with BLEU.
pdf
bib
abs
Please Mind the Root: Decoding Arborescences for Dependency Parsing
Ran Zmigrod

Tim Vieira

Ryan Cotterell
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
The connection between dependency trees and spanning trees is exploited by the NLP community to train and to decode graphbased dependency parsers. However, the NLP literature has missed an important difference between the two structures: only one edge may emanate from the root in a dependency tree. We analyzed the output of stateoftheart parsers on many languages from the Universal Dependency Treebank: although these parsers are often able to learn that trees which violate the constraint should be assigned lower probabilities, their ability to do so unsurprisingly degrades as the size of the training set decreases.In fact, the worst constraintviolation rate we observe is 24%. Prior work has proposed an inefficient algorithm to enforce the constraint, which adds a factor of n to the decoding runtime. We adapt an algorithm due to Gabow and Tarjan (1984) to dependency parsing, which satisfies the constraint without compromising the original runtime.
2017
pdf
bib
abs
Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
Tim Vieira

Jason Eisner
Transactions of the Association for Computational Linguistics, Volume 5
Pruning hypotheses during dynamic programming is commonly used to speed up inference in settings such as parsing. Unlike prior work, we train a pruning policy under an objective that measures endtoend performance: we search for a fast and accurate policy. This poses a difficult machine learning problem, which we tackle with the lols algorithm. lols training must continually compute the effects of changing pruning decisions: we show how to make this efficient in the constituency parsing setting, via dynamic programming and change propagation algorithms. We find that optimizing endtoend performance in this way leads to a better Pareto frontier—i.e., parsers which are more accurate for a given runtime.
2016
pdf
bib
Universal Decompositional Semantics on Universal Dependencies
Aaron Steven White

Drew Reisinger

Keisuke Sakaguchi

Tim Vieira

Sheng Zhang

Rachel Rudinger

Kyle Rawlins

Benjamin Van Durme
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
pdf
bib
SpeedAccuracy Tradeoffs in Tagging with VariableOrder CRFs and Structured Sparsity
Tim Vieira

Ryan Cotterell

Jason Eisner
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
pdf
bib
A Joint Model of Orthography and Morphological Segmentation
Ryan Cotterell

Tim Vieira

Hinrich Schütze
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2015
pdf
bib
abs
Reasoning about Quantities in Natural Language
Subhro Roy

Tim Vieira

Dan Roth
Transactions of the Association for Computational Linguistics, Volume 3
Little work from the Natural Language Processing community has targeted the role of quantities in Natural Language Understanding. This paper takes some key steps towards facilitating reasoning about quantities expressed in natural language. We investigate two different tasks of numerical reasoning. First, we consider Quantity Entailment, a new task formulated to understand the role of quantities in general textual inference tasks. Second, we consider the problem of automatically understanding and solving elementary school math word problems. In order to address these quantitative reasoning problems we first develop a computational approach which we show to successfully recognize and normalize textual expressions of quantities. We then use these capabilities to further develop algorithms to assist reasoning in the context of the aforementioned tasks.
2012
pdf
bib
Grammarless Parsing for Joint Inference
Jason Naradowsky

Tim Vieira

David Smith
Proceedings of COLING 2012