Please Mind the Root: Decoding Arborescences for Dependency Parsing

Ran Zmigrod, Tim Vieira, Ryan Cotterell


Abstract
The connection between dependency trees and spanning trees is exploited by the NLP community to train and to decode graph-based dependency parsers. However, the NLP literature has missed an important difference between the two structures: only one edge may emanate from the root in a dependency tree. We analyzed the output of state-of-the-art parsers on many languages from the Universal Dependency Treebank: although these parsers are often able to learn that trees which violate the constraint should be assigned lower probabilities, their ability to do so unsurprisingly de-grades as the size of the training set decreases.In fact, the worst constraint-violation rate we observe is 24%. Prior work has proposed an inefficient algorithm to enforce the constraint, which adds a factor of n to the decoding runtime. We adapt an algorithm due to Gabow and Tarjan (1984) to dependency parsing, which satisfies the constraint without compromising the original runtime.
Anthology ID:
2020.emnlp-main.390
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4809–4819
Language:
URL:
https://aclanthology.org/2020.emnlp-main.390
DOI:
10.18653/v1/2020.emnlp-main.390
Bibkey:
Cite (ACL):
Ran Zmigrod, Tim Vieira, and Ryan Cotterell. 2020. Please Mind the Root: Decoding Arborescences for Dependency Parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4809–4819, Online. Association for Computational Linguistics.
Cite (Informal):
Please Mind the Root: Decoding Arborescences for Dependency Parsing (Zmigrod et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.390.pdf
Video:
 https://slideslive.com/38939014
Code
 rycolab/spanningtrees