Distilling weighted finite automata from arbitrary probabilistic models

Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol


Abstract
Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models.
Anthology ID:
W19-3112
Volume:
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing
Month:
September
Year:
2019
Address:
Dresden, Germany
Editors:
Heiko Vogler, Andreas Maletti
Venue:
FSMNLP
SIG:
SIGFSM
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–97
Language:
URL:
https://aclanthology.org/W19-3112
DOI:
10.18653/v1/W19-3112
Bibkey:
Cite (ACL):
Ananda Theertha Suresh, Brian Roark, Michael Riley, and Vlad Schogol. 2019. Distilling weighted finite automata from arbitrary probabilistic models. In Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, pages 87–97, Dresden, Germany. Association for Computational Linguistics.
Cite (Informal):
Distilling weighted finite automata from arbitrary probabilistic models (Suresh et al., FSMNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3112.pdf