Unsupervised Natural Language Parsing (Introductory Tutorial)

Unsupervised parsing learns a syntactic parser from training sentences without parse tree annotations. Recently, there has been a resurgence of interest in unsupervised parsing, which can be attributed to the combination of two trends in the NLP community: a general trend towards unsupervised training or pre-training, and an emerging trend towards finding or modeling linguistic structures in neural models. In this tutorial, we will introduce to the general audience what unsupervised parsing does and how it can be useful for and beyond syntactic parsing. We will then provide a systematic overview of major classes of approaches to unsupervised parsing, namely generative and discriminative approaches, and analyze their relative strengths and weaknesses. We will cover both decade-old statistical approaches and more recent neural approaches to give the audience a sense of the historical and recent development of the field. We will also discuss emerging research topics such as BERT-based approaches and visually grounded learning.


Introduction
Syntactic parsing is an important task in natural language processing that aims to uncover the syntactic structure (e.g., a constituent or dependency tree) of an input sentence. Such syntactic structures have been found useful in downstream tasks such as semantic parsing, relation extraction, and machine translation.
Supervised learning is the main technique used to automatically learn a syntactic parser from data. It requires the training sentences to be manually annotated with their correct parse trees. A major challenge faced by supervised parsing is that syntactically annotated sentences are not always available for a target language or domain and building a high-quality annotated corpus is very expensive and time-consuming.
A radical solution to this challenge is unsupervised parsing, sometimes also called grammar induction, which learns a parser from training sentences without parse tree annotations. Unsupervised parsing can also serve as the basis for semi-supervised and transfer learning of syntactic parsers when there exist both unannotated sentences and (in-domain or out-of-domain) annotated sentences. In addition, the research of unsupervised parsing is deemed interesting in the field of machine learning because it is a representative task of unsupervised structured prediction, and in the field of cognitive science because it inspires and verifies cognitive research of human language acquisition.
The research on unsupervised parsing has a long history, dating back to theoretical studies in 1960s (Gold, 1967) and algorithmic and empirical studies in 1970s (Baker, 1979). Although deemed an interesting topic by the NLP community, unsupervised parsing had received much less attention than supervised parsing over the past few decades.
More recently, however, there has been a resurgence of interest in unsupervised parsing, with more than ten papers on unsupervised parsing published in top NLP and AI venues over the past two years, including a best paper at ICLR 2019 (Shen et al., 2019), a best paper nominee at ACL 2019 (Shi et al., 2019), and a best paper nominee at EMNLP 2020 (Zhao and Titov, 2020). This renewed interest in unsupervised parsing can be attributed to the combination of two recent trends. First, there is a general trend in deep learning towards unsupervised training or pre-training. Second, there is an emerging trend in the NLP community towards finding or modeling linguistic structures in neural models. The research on unsupervised parsing fits these two trends perfectly.
Because of the renewed attention on unsupervised parsing and its relevance to the recent trends in the NLP community, we believe a tutorial on unsupervised parsing can be timely and beneficial to many *ACL conference attendees. The tutorial will introduce to the general audience what unsupervised parsing does and how it can be useful for and beyond syntactic parsing. It will then provide a systematic overview of major classes of approaches to unsupervised parsing, namely generative and discriminative approaches, and analyze their relative strengths and weaknesses. It will cover both decade-old statistical approaches and more recent neural approaches to give the audience a sense of the historical and recent development of the field. We also plan to discuss emerging research topics such as BERT-based approaches and visually grounded learning.
We expect that by taking this tutorial, one can not only obtain a deep understanding of the literature and methodology of unsupervised parsing and become well prepared for his own research into unsupervised parsing, but may also get inspirations from the ideas and techniques of unsupervised pars-ing and apply or extend them to other NLP tasks that can potentially benefit from implicitly learned linguistic structures.

Overview
This will be a three-hour tutorial divided into five parts.
In the first part, we will introduce the unsupervised parsing task. We will start with the problem definition and discuss the motivations and applications of unsupervised parsing. For example, we will show that unsupervised parsing approaches can be extended for semi-supervised parsing (Jia et al., 2020) and cross-lingual syntactic transfer (He et al., 2019), and we will also show applications of unsupervised parsing approaches beyond syntactic parsing (e.g., in computer vision (Tu et al., 2013)). We will then discuss how to evaluate unsupervised parsing, including the evaluation metrics and typical experimental setups. We will promote standardized setups to enable meaningful empirical comparison between approaches (Li et al., 2020). Finally, we will give an overview of unsupervised parsing approaches to be discussed in the rest of the tutorial.
In the second and third parts, we will introduce in detail two major classes of approaches to unsupervised parsing, generative and discriminative approaches, and discuss their pros and cons.
The second part will cover generative approaches, which model the joint probability of the sentence and the corresponding parse tree. Most of the existing generative approaches are based on generative grammars, in particular context-free grammars and dependency models with valence (Klein and Manning, 2004). There are also featurized and neural extensions of generative grammars, such as ; Jiang et al. (2016). We will divide our discussion of learning generative grammars into two parts: structure learning and parameter learning. Structure learning concerns finding the optimal set of grammar rules. We will introduce both probabilistic methods such as Stolcke and Omohundro (1994) and heuristic methods such as Clark (2007). Parameter learning concerns learning the probabilities or weights of a pre-specified set of grammar rules. We will discuss a variety of priors and regularizations designed to improve parameter learning, such as Cohen and Smith (2010), Tu and Honavar (2012), Noji et al. (2016), and (Jin et al., 2018).
We will also discuss parameter learning algorithms such as expectation-maximization (Baker, 1979;Spitkovsky et al., 2010b), MCMC (Johnson et al., 2007) and curriculum learning (Spitkovsky et al., 2010a). After introducing approaches based on generative grammars, we will discuss recent approaches that are instead based on neural language models (Shen et al., 2018(Shen et al., , 2019. The third part will cover discriminative approaches, which model the conditional probability or score of the parse tree given the sentence. We will first introduce autoencoder approaches such as Cai et al. (2017), which contain an encoder that maps the sentence to an intermediate representation (such as a parse tree) and a decoder that tries to reconstruct the sentence. Their training objective is typically the reconstruction probability. We will then introduce variational autoencoder approaches such as Kim et al. (2019), which has a similar model structure to autoencoder approaches but uses the evidence lower bound as the training objective. Finally, we will briefly discuss other discriminative approaches such as Grave and Elhadad (2015).
In the fourth part, we will focus on several special topics. First, while most of the previous approaches to unsupervised parsing are unlexicalized, we will discuss the impact of partial and full lexicalization (e.g., the work by Pate and Johnson (2016); Han et al. (2017)). Second, we will discuss whether and how big training data could benefit unsupervised parsing (Han et al., 2017). Third, we will introduce recent attempts to induce syntactic parses from pretrained language models such as BERT (Rosa and Mareček, 2019;Wu et al., 2020). Fourth, we will cover unsupervised multilingual parsing, the task of performing unsupervised parsing jointly on multiple languages (e.g., the work by ; Han et al. (2019)). Fifth, we will introduce visually grounded unsupervised parsing, which tries to improve unsupervised parsing with the help from visual data (Shi et al., 2019). Finally, we will discuss latent tree models trained with feedback from downstream tasks, which are related to unsupervised parsing (Yogatama et al., 2016;Choi et al., 2018).
In the last part, we will summarize the tutorial and discuss potential future research directions of unsupervised parsing. 4 Prerequisites for the Attendees Linguistics Familiarity with grammars and syntactic parsing.
Machine Learning Basic knowledge about generative vs. discriminative models, unsupervised learning algorithms (such as expectationmaximization), and deep learning.
5 Reading List Klein and Manning (2004) -An influential generative approach to unsupervised dependency parsing that is the basis for many subsequent papers. Jiang et al. (2016) -A neural extension of Klein and Manning (2004). One of the first modern neural approaches to unsupervised parsing. Stolcke and Omohundro (1994) -One of the first structure learning approaches of context-free grammars for unsupervised constituency parsing.