%0 Conference Proceedings %T Applying Occam’s Razor to Transformer-Based Dependency Parsing: What Works, What Doesn’t, and What is Really Necessary %A Grünewald, Stefan %A Friedrich, Annemarie %A Kuhn, Jonas %Y Oepen, Stephan %Y Sagae, Kenji %Y Tsarfaty, Reut %Y Bouma, Gosse %Y Seddah, Djamé %Y Zeman, Daniel %S Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021) %D 2021 %8 August %I Association for Computational Linguistics %C Online %F grunewald-etal-2021-applying %X The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insights together, we propose a simple but widely applicable parser architecture and configuration, achieving new state-of-the-art results (in terms of LAS) for 10 out of 12 diverse languages. %R 10.18653/v1/2021.iwpt-1.13 %U https://aclanthology.org/2021.iwpt-1.13 %U https://doi.org/10.18653/v1/2021.iwpt-1.13 %P 131-144