Higher-order Derivatives of Weighted Finite-state Machines
Ran
Zmigrod
author
Tim
Vieira
author
Ryan
Cotterell
author
2021-08
text
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Association for Computational Linguistics
Online
conference publication
Weighted finite-state machines are a fundamental building block of NLP systems. They have withstood the test of time—from their early use in noisy channel models in the 1990s up to modern-day neurally parameterized conditional random fields. This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of second-order derivatives, our scheme runs in the optimal O(A\² N\⁴) time where A is the alphabet size and N is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing second-order expectations, such as covariance matrices and gradients of first-order expectations.
zmigrod-etal-2021-higher
10.18653/v1/2021.acl-short.32
https://aclanthology.org/2021.acl-short.32
2021-08
240
248