Syntax in End-to-End Natural Language Processing

This tutorial surveys the latest technical progress of syntactic parsing and the role of syntax in end-to-end natural language processing (NLP) tasks, in which semantic role labeling (SRL) and machine translation (MT) are the representative NLP tasks that have always been beneficial from informative syntactic clues since a long time ago, though the advance from end-to-end deep learning models shows new results. In this tutorial, we will first introduce the background and the latest progress of syntactic parsing and SRL/NMT. Then, we will summarize the key evidence about the syntactic impacts over these two concerning tasks, and explore the behind reasons from both computational and linguistic backgrounds.


Tutorial Content
Syntax is the insightfulness about formal relative position inside languages, whose mathematical formalism was pioneered by Chomsky (1957). Syntactic parsing has been enduring for a significant progress since deep learning was fully introduced into natural language processing (NLP). We identify two development stages for parsing techniques by considering whether deep learning was involved or not. For the parsers that were built on traditional machine learning models, most work focus on designing better search algorithms or better structural modeling about syntax, while few ever consider feature engineering. For the parsers using deep learning models, most work turn to more effective and more salient representations, following the same structural formalization since the times of traditional parsers.
We observe a series of significant performance improvement since 2014 (Chen and Manning, 2014;Dozat and Manning, 2017). In this part, we will survey the key language representation improvement for syntactic parsing.
In general, syntactic information contributes to other end-to-end NLP tasks, such as SRL and MT. We summarize the contribution of syntax to SRL and MT in Table 1. Syntax in SRL. SRL or semantic parsing as a computational job started since different semantic annotated datasets were released in recent two decades, which is trained by using PropBank such as Palmer et al. (2005). During treebank annotation, the semantic annotation may be naturally assigned onto syntactic constituents, so that it makes sense that the latter may help the former in either of linguistic explanation or machine learning procedure. Considering syntactic information helps or not, the performance variation of SRL may range about 5-10% in terms of traditional models. However, there has come new results since end-to-end SRL was proposed. Nearly all state-of-the-art SRL models, either span or dependency, have been based on LSTM backbone since Zhou and Xu (2015a). We attribute such a change of syntactic role to the effective distributional and contextualized representation offered by the LSTM from word embedding. Note that word embedding may have both syntactic and semantic sense.
Since the method by Zhou and Xu (2015b) and Marcheggiani et al. (2017), deep-learningbased SRL has obtained much less contribution from syntactic input.
For either span or dependency SRL, deep models receive a less than 2% performance improvement even when perfect syntax (gold syntax labels) is introduced as shown by He et al. (2017a) and He et al. (2018a). We re-implemented the model of Li et al. (2019) and introduced a syntactic constraint in their span selection from a strong parser, which indicates that stronger syntax-agnostic models receive less enhancement from syntax information.  Table 1: Role of different technical factors for the three NLP tasks. "++" denotes the significant performance contribution when used alone; "+" denotes the moderate contribution; "0" denotes mainly studies in zero/lowresource scenarios; "-" denotes negative or little impact. The mark in the rightmost column indicates whether it is overall effective when all marked factors to the left are combined.
Syntax in MT also endures a methodology change from statistical machine translation (SMT) (Brown et al., 1993) to neural machine translation (NMT) (Sutskever et al., 2014;Bahdanau et al., 2015) as the task of SRL. For typical SMT, besides phrase based SMT (Och et al., 1999;Koehn et al., 2003), syntactic (tree) based methods have been well developed (Yamada and Knight, 2001;Mi et al., 2008). In some scenarios, especially when the domain of the MT corpus is similar to the domain of the parsing corpus, the performance of tree based SMT is better than phrase based SMT (Koehn, 2009). For NMT, it so far achieves significant progress by using end-toend based structure since 2014 (Sutskever et al., 2014;Bahdanau et al., 2015). Recently, selfattention based transformer (Vaswani et al., 2017) has become new state-of-the-art architecture in NMT and gives a series of new state-of-the-art benchmarks ( Linguistic in MT. In addition, we will investigate why linguistic cognition and prior knowledge can enhance the control of the dominant end-to-end neural framework, which makes the translation between a language pair proceed according to the expected and interpretable way. On one hand, linguistic cognition enables translation model (1) to reduce translation errors that violate common sense, such as over/under-translation questions (Tu et al., 2016), troublesome words modeling  and so on; (2) to have some basic abilities of human translator, for example, word importance modeling (Chen et al., 2020), translation refinement (Song et al., 2020), structured information , diverse feature (Chen et al., 2020) and so on.
On the other hand, linguistic prior knowledge (i.e. alignment, bilingual lexicon, phrase table, and knowledge graphs) to alleviate the problem of inadequacy target translations which are caused by the language model property of the encoderdecoder framework (Feng et al., 2017;Wang et al., 2018b). Moreover, linguistic differences between the source language and target language can learn natural language representations that are easy to be understood by the translation model, for example, word order difference (Chen et al., 2019; Ding et al., 2020), morphological differences (Ji et al., 2019) and so on. Meanwhile, linguistic shared feature between the source language and target language can also enhance the understanding and generation of natural language in MT, for example, shared words (Artetxe et al., 2018), image information (Yin et al., 2020), video information (Wang et al., 2020) and so on.

Relevance to the Computational Linguistics Community
The topics included in this tutorial, i.e., syntax parsing, SRL, and MT, are all the classic ones to the entire NLP/CL community. This tutorial is primarily towards researchers who have a basic understanding of deep learning based NLP. We believe that this tutorial would help the audience more deeply understand the relationship between three classic NLP tasks, i.e., syntax parsing and SRL/MT.

Tutorial Outlines
We will present our tutorial in three hours. The detailed tutorial outlines are shown in Table 1.

Breadth
20-30% of the tutorial covers work by the tutorial presenters and 70-80% by other researchers. • Syntactic Parsing: Deep biaffine attention for neural dependency parsing (Dozat and Manning, 2016) and Constituency parsing with a self-attentive encoder (Kitaev and Klein, 2018).

Diversity Considerations
• SRL: Syntax for semantic role labeling, to be, or not to be  and Deep semantic role labeling: What works and whats next (He et al., 2017b).

cn/˜zhaohai
His research interest is natural language processing. He has published more than 120 papers in ACL, EMNLP, COLING, ICLR, AAAI, IJCAI, and IEEE TKDE/TASLP. He won the first places in several NLP shared tasks, such as CoNLL and SIGHAN Bakeoff and top ranking in remarkable machine reading comprehension task leaderboards such as SQuAD2.0 and RACE.
He has taught the course "natural language processing" in SJTU for more than 10 years. He is ACL-2017 area chair on parsing, and ACL-2018/2019 (senior) area chairs on morphology and word segmentation. 2. Dr. Rui Wang, Tenured Researcher, Advanced Translation Technology Laboratory, National Institute of Information and Communications Technology (NICT), Japan wangrui.nlp@gmail.com https://wangruinlp.github.io His research focuses on machine translation (MT), a classic task in NLP. His recent interests are traditional linguistic based and cutting-edge machine learning based approaches for MT. He (as the first or the corresponding authors) has published more than 30 MT papers in top-tier NLP/ML/AI conferences and journals, such as ACL, EMNLP, ICLR, AAAI, IJCAI, IEEE/ACM transactions, etc. He has also won several first places in top-tier MT shared tasks, such as WMT-2018, WMT-2019, WMT-2020 He has given several tutorial and invited talks in

Previous Venues and Approximate Audience Sizes
There are some tutorials focusing on single NLP tasks, such as NMT in ACL-2016/IJCNLP-2018, semantic parsing in ACL-2018. In particular, the NMT tutorial at ACL-2016 (with around 800 registrations) had attracted around 150 attendees and the one at IJCNLP-2017 (with around 300 registrations) had attracted around 40 attendees. Our tutorial will become the first one that explores the relationship between syntactic impact and end-to-end NLP tasks. As our topic is rather broader, we hope that this tutorial will attract around 100-200 attendees.

Special Requirements
None 12 Preferable Venue(s)