Linear Programming Decoders in Natural Language Processing: From Integer Programming to Message Passing and Dual Decomposition

André F. T. Martins


Abstract
This tutorial will cover the theory and practice of linear programming decoders. This class of decoders encompasses a variety of techniques that have enjoyed great success in devising structured models for natural language processing (NLP). Along the tutorial, we provide a unified view of different algorithms and modeling techniques, including belief propagation, dual decomposition, integer linear programming, Markov logic, and constrained conditional models. Various applications in NLP will serve as a motivation.There is a long string of work using integer linear programming (ILP) formulations in NLP, for example in semantic role labeling, machine translation, summarization, dependency parsing, coreference resolution, and opinion mining, to name just a few. At the heart of these approaches is the ability to encode logic and budget constraints (common in NLP and information retrieval) as linear inequalities. Thanks to general purpose solvers (such as Gurobi, CPLEX, or GLPK), the practitioner can abstract away from the decoding algorithm and focus on developing a powerful model. A disadvantage, however, is that general solvers do not scale well to large problem instances, since they fail to exploit the structure of the problem.This is where graphical models come into play. In this tutorial, we show that most logic and budget constraints that arise in NLP can be cast in this framework. This opens the door for the use of message-passing algorithms, such as belief propagation and variants thereof. An alternative are algorithms based on dual decomposition, such as the subgradient method or AD3. These algorithms have achieved great success in a variety of applications, such as parsing, corpus-wide tagging, machine translation, summarization, joint coreference resolution and quotation attribution, and semantic role labeling. Interestingly, most decoders used in these works can be regarded as structure-aware solvers for addressing relaxations of integer linear programs. All these algorithms have a similar consensus-based architecture: they repeatedly perform certain "local" operations in the graph, until some form of local agreement is achieved. The local operations are performed at each factor, and they range between computing marginals, max-marginals, an optimal configuration, or a small quadratic problem, all of which are commonly tractable and efficient in a wide range of problems.As a companion of this tutorial, we provide an open-source implementation of some of the algorithms described above, available at http://www.ark.cs.cmu.edu/AD3.
Anthology ID:
D14-2004
Volume:
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Month:
October
Year:
2014
Address:
Doha, Qatar
Editors:
Lucia Specia, Xavier Carreras
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/D14-2004
DOI:
Bibkey:
Cite (ACL):
André F. T. Martins. 2014. Linear Programming Decoders in Natural Language Processing: From Integer Programming to Message Passing and Dual Decomposition. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Doha, Qatar. Association for Computational Linguistics.
Cite (Informal):
Linear Programming Decoders in Natural Language Processing: From Integer Programming to Message Passing and Dual Decomposition (Martins, EMNLP 2014)
Copy Citation: