Alexandra Butoi


2023

pdf bib
Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages
Alexandra Butoi | Tim Vieira | Ryan Cotterell | David Chiang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The class of tree-adjoining languages can be characterized by various two-level formalisms, consisting of a context-free grammar (CFG) or pushdown automaton (PDA) controlling another CFG or PDA. These four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdown-adjoining automata (PAA), and embedded pushdown automata (EPDA). We define semiring-weighted versions of the above two-level formalisms, and we design new algorithms for computing their stringsums (the weight of all derivations of a string) and allsums (the weight of all derivations). From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA. For LIG, our algorithm is more time-efficient by a factor of š¯’Ŗ(n|š¯’©|) (where n is the string length and |š¯’©| is the size of the nonterminal set) and more space-efficient by a factor of š¯’Ŗ(|š¯›¤|) (where š¯›¤ is the size of the stack alphabet) than the algorithm of Vijay-Shanker and Weir (1989). For EPDA, our algorithm is both more space-efficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of š¯’Ŗ(|š¯›¤|2) and š¯’Ŗ(|š¯›¤|3), respectively. Finally, we give the first PAA stringsum and allsum algorithms.

pdf bib
Convergence and Diversity in the Control Hierarchy
Alexandra Butoi | Ryan Cotterell | David Chiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Weir has defined a hierarchy of language classes whose second member (L2) is generated by tree-adjoining grammars (TAG), linear indexed grammars (LIG), combinatory categorial grammars, and head grammars. The hierarchy is obtained using the mechanism of control, and L2 is obtained using a context-free grammar (CFG) whose derivations are controlled by another CFG. We adapt Weirā€™s definition of a controllable CFG (called a labeled distinguished CFG) to give a definition of controllable pushdown automata (PDAs), called labeled distinguished PDAs. This yields three new characterizations of L2 as the class of languages generated by PDAs controlling PDAs, PDAs controlling CFGs, and CFGs controlling PDAs. We show that these four formalisms are not only weakly equivalent but equivalent in a stricter sense that we call d-weak equivalence. Furthermore, using an even stricter notion of equivalence called d-strong equivalence, we make precise the intuition that a CFG controlling a CFG is a TAG, a PDA controlling a PDA is an embedded PDA, and a PDA controlling a CFG is a LIG. The fourth member of this family, a CFG controlling a PDA, does not correspond to any kind of automaton we know of, so we invent one and call it a Pushdown Adjoining Automaton (PAA).

2022

pdf bib
Algorithms for Weighted Pushdown Automata
Alexandra Butoi | Brian DuSell | Tim Vieira | Ryan Cotterell | David Chiang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Weighted pushdown automata (WPDAs) are at the core of many natural language processing tasks, like syntax-based statistical machine translation and transition-based dependency parsing. As most existing dynamic programming algorithms are designed for context-free grammars (CFGs), algorithms for PDAs often resort to a PDA-to-CFG conversion. In this paper, we develop novel algorithms that operate directly on WPDAs. Our algorithms are inspired by Langā€™s algorithm, but use a more general definition of pushdown automaton and either reduce the space requirements by a factor of |Gamma| (the size of the stack alphabet) or reduce the runtime by a factor of more than |Q| (the number of states). When run on the same class of PDAs as Langā€™s algorithm, our algorithm is both more space-efficient by a factor of |Gamma| and more time-efficient by a factor of |Q| x |Gamma|.