Coordinate Constructions in English Enhanced Universal Dependencies: Analysis and Computational Modeling

In this paper, we address the representation of coordinate constructions in Enhanced Universal Dependencies (UD), where relevant dependency links are propagated from conjunction heads to other conjuncts. English treebanks for enhanced UD have been created from gold basic dependencies using a heuristic rule-based converter, which propagates only core arguments. With the aim of determining which set of links should be propagated from a semantic perspective, we create a large-scale dataset of manually edited syntax graphs. We identify several systematic errors in the original data, and propose to also propagate adjuncts. We observe high inter-annotator agreement for this semantic annotation task. Using our new manually verified dataset, we perform the first principled comparison of rule-based and (partially novel) machine-learning based methods for conjunction propagation for English. We show that learning propagation rules is more effective than hand-designing heuristic rules. When using automatic parses, our neural graph-parser based edge predictor outperforms the currently predominant pipelines using a basic-layer tree parser plus converters.


Introduction
The Universal Dependencies (UD) formalism (de Marneffe et al., 2014) is a framework for representing syntactic dependencies between words, prioritizing links between content words. UD parses provide two levels of analysis. Basic dependencies form standard syntactic dependency trees in which each node has exactly one governor (black links on top in Figure 1). Enhanced dependencies (Schuster and Manning, 2016) are extensions of these trees including additional relations (blue links below sentence) with the aim of representing linguistic phenomena such as coordination, control, or relative clauses. They have been shown to provide valuable In 1594 PEREZ wrote and published a book . input for information extraction tasks (Schuster et al., 2017). One of the most frequent phenomena addressed by enhanced UD is coordination. In the English Web Treebank (EWT), more than 15% of all sentences contain conjoined verbs. Hence, a good representation of coordination clearly is crucial for downstream tasks. For example, in Figure 1, the enhanced layer explicitly captures that the arguments of the predicate "wrote" also fill the corresponding slots of "published," which is highly relevant for natural language understanding tasks.
In many cases, enhanced representations can be derived from the gold basic layer in a rule-based fashion (Schuster and Manning, 2016). The currently available English enhanced UD treebanks have been created by applying such a converter. However, we are not aware of a large study regarding their correctness and completeness. Focusing on precision, the converter only propagates core arguments. In this paper, we take a complementary approach, performing a large-scale annotation study in order to determine which set of links should be propagated from a semantic perspective. On a new dataset of 1,417 sentences from the EWT containing conjoined verbs, we verify and if necessary modify/extend the links involved in coordinate constructions. We argue that adjuncts such as obliques should in fact be propagated at times, e.g., in Figure 1, the additional (green dotted) link that we propose to add facilitates answering questions like "When was the book published?". To the best of our knowledge, our work constitutes the first large-scale annotation effort of this kind.
On the basis of our new dataset, we make the following contributions. First, we estimate the degree of correctness and completeness of the rulebased converter/existing treebanks. We find that the converter usually generates correct graphs when applied on gold basic trees, with some notable exceptions involving non-parallel syntactic constructions (e.g., conjuncts having different voice or mood). In addition, the converter does not propagate links correctly in presence of multiple interacting conjunctions. Our inter-anntotator agreement study shows high overlap for propagation decisions, with F1 between pairs of annotators of about 0.9 on average and around 0.75 for obliques.
Second, we address the question of how to create high-quality treebanks for enhanced UD from gold basic dependencies, again focusing on coordinate constructions. Based on the findings of our corpus study, we improve the rule-based converter by Schuster and Manning (2016). We also compare machine-learning (ML) based conjunction propagation classifiers in the form of (a) SVMbased classifiers as previously used for Finnish, Swedish and Italian (Nyblom et al., 2013;, and (b) a novel neural approach integrating tree-and RoBERTa-based features. We find that all systems mostly rely on tree-based features, but contextual embeddings also provide useful information. Performance on propagation decisions has promising F1 around 0.9, already similar to human agreement. ML-based classifiers outperform the rule-based converters on the EWT test set.
Third, we compare methods for extracting propagated dependencies in an automatic parsing setting. The currently predominant approach is to run a basic-layer tree parser and then the same converter that has been used for gold standard construction. We propose to use a neural graph-parser based edge predictor with an architecture similar to Dozat and Manning (2018) instead, and show that this approach outperforms pipelines by around 9 points F1 on propagating links in conjunctions.
In sum, our contributions include: (1) a manually curated large-scale dataset of 1,417 sentences addressing semantically motivated correct and complete conjunction propagation in enhanced UD; (2) the proposal of novel neural approaches to conjunc-tion propagation; and (3) experimental evidence that these models outperform rule-and pipelinebased approaches in both gold standard treebank enhancing and automatic parsing settings. To the best of our knowledge, our work constitutes the first principled comparison of various approaches to propagating conjunctions in enhanced UD on manually corrected gold standard data for English. Both our model implementations and the dataset are freely available. 1 We will contribute our changes to the EWT corpus to the next UD release.

Related Work
Coordinate Constructions in UD are represented using the conj relation, with the first conjunct being the head to which all dependencies of the phrase are attached (see Figure 1). In the basic layer, all governors and dependents of a conjoined phrase are attached to the first conjunct. In the enhanced layer, relations are propagated to the dependent if suggested by the semantics of the sentence. 2 Schuster and Manning (2016) present an algorithm for creating enhanced dependencies automatically based on the basic layer. While it propagates links with high precision, it propagates only core arguments by design (see Appendix A). In addition, it is highly reliant on correct basic dependencies (see Sec. 5).
Conjunction propagation classifiers. Nyblom et al. (2013) present an SVM-based approach for enhancing Finnish syntax trees. They observe high performance on conjunction propagation when operating on gold basic trees, but markedly worse results when using automatic parser output.  evaluate a similar approach for Swedish and Italian. We show that their approach also works well for English, and extend it with neural models and contextualized word embeddings.  de-lexicalize their rule-based converter developed for Italian, showing that their language-independent system also correctly produces most of the propagations for English. However, they evaluate on EWT, which is itself the result of a rule-based system. In contrast, we evaluate on manually checked gold data.
For the related task of dealing with gapping constructions such as "Paul likes coffee and Mary tea,"  reconstruct elided predicates by first parsing into an intermediate representation and then applying either a rule-based or an ML-based algorithm to copy over lexical material. We here focus on dependency propagation and operate on gold tokens as annotated in the enhanced UD treebanks, which already include traces. Other related work exists in the area of manual and rule-based error correction on UD treebanks (Wisniewski, 2018;Alzetta et al., 2018).
There is still little published work regarding fully automatic enhanced UD parsing, however, the topic has recently been addressed by the IWPT 2020 Shared Task (Bouma et al., 2020). Among the top-performing systems, several approaches first parse into basic UD and then added transformation rules (e.g., Heinecke, 2020;Dehouck et al., 2020). Others directly employ graph parsing techniques (e.g., Wang et al., 2020;He and Choi, 2020;Hershcovich et al., 2020). The overall winner TurkuNLP (Kanerva et al., 2020) transforms enhanced UD into a tree format and then makes use of UDify (Kondratyuk and Straka, 2019). In addition, much work exists on semantic dependency parsing (SDP, Oepen et al., 2014Oepen et al., , 2015May and Priyadarshi, 2017). These works differ from UDbased approaches as the respective formalisms represent meaning less close to syntactic structure, thus not requiring propagation. From a modeling point of view, our work is most similar to that of Grünewald and Friedrich (2020), who also use a graph-based biaffine architecture for enhanced UD parsing, and to that of Dozat and Manning (2018), who achieve state-of-the-art results for SDP.

Coordinate Constructions Dataset
In this section, we describe our creation of our manually created dataset and analyse the results.

Data
Our dataset consists of 1,417 sentences collected from EWT, 3 containing data from five genres of web media (weblogs, newsgroups, emails, reviews, and Yahoo! answers). 4 The basic dependencies of this UD gold standard have been derived from the original Stanford dependencies (de Marneffe et al., 2006) and were then hand-corrected. The enhanced layer has been created using the automatic converter (Schuster and Manning, 2016, see Appendix A). We retrieve all sentences containing at least one conj link between two verbs. More than 15% of all sentences in EWT contain conjoined verbs. Out of these sentences, we edit all sentences of the dev and test sets, and 999 sentences of the training set, amounting to more than 60% of all relevant sentences in EWT (see Table 1). The careful curation of each sentence took around 10 minutes on average, amounting to a total annotation effort of around 240 hours (total costs ca. $4,750). We exclude 18 sentences when reporting our statistics: In 12 cases, the conj relation is annotated wrongly in the basic layer and six sentences contain syntactically non-standard English. 5

Annotation Methodology
The manual corrections of the treebank were performed by a French native speaker with an extensive background in linguistics. The annotation project involved regular discussions among all authors to decide on uncertain cases and to ensure consistency. Additionally, in case of doubt, an English native speaker with an extensive linguistics background was consulted. Dependencies were checked carefully sentence-wise using the ConLL-U-Editor tool (Heinecke, 2019). If necessary, the full document was consulted to make sure interpretations were correct in context.  the interpretation of the sentence suggests additional syntactic relations between words.
As each verb may also have its own complements, this task requires a semantic interpretation leveraging context and knowledge about selectional preferences. If an ambiguity has already been resolved in the basic layer, 6 we follow this interpretation unless obviously wrong. Second, we propose to also propagate non-core dependents such as obl, advcl and advmod if suggested by semantics, an annotation task similar to prepositional phrase attachment resolution. We only propagate such links if the adjunct clearly modifies each conjunct (as in Figure 1). Finally, we extend the attachment of relative pronouns (ref ) to all antecedents if involved in coordinations. We focus on propagating dependencies between content words, not propagating relations such as aux or cop, which could be handled as traces.
Inter-annotator agreement study. We sampled 100 sentences, half of them from cases where the primary annotator had judged the original version to be correct, and half of them cases that included modifications. This sample was blindly re-annotated by two secondary annotators, both German native speakers with an extensive computational linguistics background. Table 2 shows agreement in terms of precision and recall on the set of dependencies resulting from conjunction propagation, i.e., the links involved in conjunctions that are present in the enhanced layer but not in the basic layer. For a formal definition, see Appendix B. Agreement is generally high, particularly between annotators A and B. Annotator C was more conservative in propagating links, especially in generally ambiguous cases. However, the links that C propagates are also propagated by A and B. Pairwise agreement was high on nsubj, obj and xcomp. Modifier clauses (acl, advcl) and adverbials (advmod) were common sources of disagreement, indicating the more ambiguous nature of these propagations. Pairwise scores and more details can be found in Appendix B.

Analysis and Discussion
In this section, we analyse and discuss the modifications made to the original treebanks.
Quantitative Analysis of Changes. Table 3 presents the numbers of dependency relations that have been added and removed in coordinate constructions in the enhanced layer. More specifically, we consider only the set of links not present in the basic tree and count modifications regarding links starting or ending at conjuncts. 7 Counts for coarse-grained labels (e.g., nmod) include all subtypes (e.g., nmod:for) not explicitly listed in the table. During our manual correction of the treebank, around 15% of the total enhanced links involved in conjoined phrases were added and about 3% were removed. This confirms that the converter by Schuster and Manning (2016) is optimized for precision rather than recall, though our additions of course include labels that the converter does not address. Note that in these cases, removed relations in Table 3 are caused by fixes regarding attachment in the basic layer, whose errors had been propagated to the enhanced layer. In total, we fixed errors in 57 sentences in the basic layer. In 42 of these, this led to changes in the enhanced layer.
Linguistic Analysis of Changes. One systematic error involves links to subjects in passive constructions: 18 out of 225 nsubj:pass links were actually wrongly propagated. All of them have been changed to nsubj. The reason is that the converter automatically propagates an nsubj:pass link if the first conjoined verb is in the passive form, as, e.g., in "These Shiite movements had been suppressed by Saddam Hussein's regime, but have now organized and armed themselves" (see Figure 2a). Another common error (occurring 12 times) is the propagation of the first conjoined verb's subject to the second verb, even though the latter is in imperative mood, as, e.g., in "I think it was the Lincoln Square area but don't quote me on that" (see Figure 2b).
In sentences containing multiple coordinate constructions, such as "Dr. Fortier and his girlfriend lashed two canoes together and paddled They had been suppressed , but have now organized themselves . I think it was Lincoln Square but don't quote me on that .     eight kilometres along the Soper River," nsubj links should be present in the enhanced layer between both conjuncts of the subject noun phrase and both verbs. However, in the original treebank, the second subject conjunct was never propagated to the second verb (see Figure 2c). Similarly, we also added many relations in cases of nested coordinations as in "These Shiite movements had been suppressed by Saddam Hussein's regime, but have now organized and armed themselves." The second conjunct of the conjoined verb phrase is a conjoined verb phrase itself, but the nsubj link to "armed" was missing. In total, 194 sentences contain several coordinations, and we modified 92 of them. This phenomenon also accounts for 45 of the added nsubj links.
Some originally missing propagations concern adjectival and adverbial modifiers (acl, amod, advcl, advmod), which are known to be ambiguous cases. In "Handwritten notes and files on a laptop were seized," the adjective "handwritten" clearly modifies the first conjunct "notes" only, but in "Sev-eral Indian scholars and politicians have been ready to say and endorse anything," the propagation of "several" and "Indian" was added during our modifications. These cases involve world knowledge that the converter currently does not handle.
Finally, consider the sentence "We recognize that the state may not require religious groups to officiate at, or bless, same-gender marriages." Both conjuncts take "marriages" as their argument, but as an obl and as an obj relation, respectively. The resolution of such non-parallel constructions requires detailed subcategorization information.

Modeling
In this section, we describe three approaches to generating links propagated due to coordination: (1) an improved version of an existing converter (Sec. 4.1); (2) ML-based propagation classification operating on basic trees (Sec. 4.2); and (3) a graphparser based approach for directly predicting edges between tokens (Sec. 4.3). While (1) and (2) may be used to construct "silver standard" enhanced UD graphs from gold trees, (3) is applicable in the automatic parsing setting only.

Modifications to Rule-based Converter
Based on the error analysis in Sec. 3.3, we modify the rule-based converter by Schuster and Manning (2016) as follows. In order to fix errors related to subject propagation in passive and imperative constructions, we take the conjunction dependent's morphological features into account. In the gold standard, the Voice feature is considered to be active by default. Hence, if the conjunction dependent does not have a Voice feature or is explicitly marked as active, an nsubj:pass dependency will be propagated as nsubj. Similarly, if it has the feature Mood=Imp, an nsubj link will not be propagated. Our second modification propagates common adjuncts of verbs as well (obl, advmod, and advcl). We maintain the rule from object propagation that a dependency is only propagated if the dependent comes after the potential target in the sentence. Finally, to handle multiple and nested coordinations, we iterate the converter's conjunction propagation function until the dependency graph does not change any more. This allows dependencies that result from propagation to be propagated themselves, retrieving links that would otherwise be missed.

Conjunction Propagation Classifiers
The core idea of ML-based conjunction propagation classifiers is to take a basic-layer tree and to decide for each incoming or outgoing dependency of the head of a coordinated phrase whether to propagate this dependency to the other coordinated item(s). We refer to the coordinated nodes as conjunction head and conjunction dependent and to the candidate governor/dependent of the second conjunct as the propagation target. In Figure 1, these three nodes correspond to "wrote," "published" and "1954" (or "PEREZ"/"book"), respectively. The output is a binary decision whether to propagate the given dependency or not. In addition to the features described below, we always provide the candidate dependency label and direction.
SVM-based Classifier. We re-implement the method proposed by Nyblom et al. (2013) using scikit-learn's SVC with a polynomial kernel of degree 2. 8 The features comprise morphological information about the tokens for the conjunction head/dependent and the target, as well as structural tree features extracted from the basic-layer tree. For a detailed description, see Appendix D.
Neural network classifier. We pass the sentence through the transformer-based neural language model RoBERTa (Liu et al., 2019) and extract the word embeddings for the first wordpiece tokens of the conjunction head, the conjunction dependent, and the propagation target. In addition, we use equivalents of the SVM tree features using learned embeddings or one-hot encodings (see Appendix D). The inputs are concatenated and fed to a multi-layer perceptron, which then outputs the binary decision whether to propagate the dependency or not. The multi-layer perceptron consists of two linear layers with hidden sizes 1500 and 500 respectively. We implement the model using Huggingface's Transformers library (Wolf et al., 2019). RoBERTa weights are not fine-tuned.

Graph-Parser Based Edge Prediction
In addition to the above approaches, we also evaluate a graph-parser based approach that predicts dependencies between tokens directly, i.e., which does not rely on a basic-layer tree. Our unfactorized architecture is similar to that of Grünewald and Friedrich (2020), i.e., our model predicts presence of edges and the corresponding labels in a single step, treating nonexistence of an edge as simply another label (∅). As we focus on the dependencies involved in conjunctions, we do not require the parser's output to constitute valid graphs.
Embeddings for input tokens are generated by feeding gold tokens to the RoBERTa tokenizer and then running the resulting word-pieces through the RoBERTa-large model. We then generate an embedding r i for the token at position i by forming a weighted sum of the hidden layers' embeddings at the positions corresponding to the first word-piece token of the original token as suggested by Kondratyuk and Straka (2019). Weights for this scalar mixture of layers are learned during training. Layers are randomly dropped during training to prevent the model from focusing on only a single layer.
For each input embedding r i , we create a head representation h head i and a dependent representation h dep i via two feed-forward neural networks: For each ordered pair (i, j) of tokens, we feed their respective head and dependent representations to a biaffine classifier (Dozat and Manning, 2017) predicting logits s i,j over the possible dependency labels. We use these logits to extract the probabilities P (y i,j ) for each label: U, W and b in (3) are learned parameters; ⊕ denotes concatenation. The model is trained to minimize cross entropy loss w. r. t. the true dependency label between each pair of tokens. If a token is not assigned any head due to ∅ scoring highest for all other tokens, we assign the highest-scoring non-∅-relation and the corresponding head.
The model is simply trained to predict all link types in enhanced UD graphs. In the training section of the EWT corpus, we replace every sentence that contains a coordinated verb phrase with our manually corrected version of that sentence, or remove it from the corpus if it is one of the 927 conjunction sentences in the training section which we did not correct. For hyperparameter settings, see Appendix E.

Experiments
In this section, we describe our experiments on creating enhanced UD representations for coordinate constructions. Analogous to Nyblom et al. (2013), we measure precision, recall and F1 on enhanced links that are the result of propagation in coordinate constructions. For all experiments, we use gold sentence segmentation and tokenization, and evaluate on our manually corrected sentences from the dev and test sets of the EWT corpus.

Gold Standard Treebank Enhancing
We first address the research question of how to best generate enhanced representations for treebanks with gold standard basic annotations. We compare the following models: (1) an "Always" baseline, which simply propagates all incoming and outgoing links from the conjunction head to the conjunction dependent(s); (2) the rule-based converter by Schuster and Manning (2016) and the variations thereof we developed inspired by our corpus study; (3) our re-implementation of the SVM-based classifier by Nyblom et al. (2013); and (4) our neural-network (NN) based classifier. The latter uses AdamW (Loshchilov and Hutter, 2017) with a learning rate of 5e-5, a batch size of 1 and early stopping. Table 4 reports the results on the development and test sets of our manually verified conjunction dataset. The recall of the "Always" baseline is not at 100% because a small number of relations change their label during propagation, e.g., nsubj→nsubj:pass.
Rule-based conversion. We show results for successively adding components to the original converter (RBC). On the test set, adding propagation of non-core dependents and allowing several iterations increases recall and improves F1 by more than 2 points. On the dev set, in contrast, we do not observe these effects. 9 Adding our suggested passive/imperative fix surprisingly decreased performance. Analysis showed that the cases that our converter got wrong were caused by erroneous morphological feature annotations in the basic layer. In sum, our suggested improvements (RBC2) of heuristically propagating adjuncts (obl, advmod, acl) and allowing several resolution passes of the converter seem to improve treebank enhancing, provided that the basic layer is correct.
ML-based conversion. Overall, the SVM and NN models show similar performance. As they perform already close to human agreement (see Table 2), further improvement may actually indicate overfitting. On the test set, the ML-based methods outperform the heuristic rule-based methods, surpassing the original converter by over 4 points F1. We conclude that learning structural rules based on actual gold standard data is more effective than hand-designing them. Differences on the dev set are less pronounced despite models being optimized on this data, again hinting to some qualitative differences between the two sets.
In order to determine which sources of information are most relevant, we perform ablation experiments for both classifiers. The features representing the candidate dependency label and the direction of the link are essential and kept in each case. Both the SVM and the NN classifiers draw most of their information from tree-based features. This effect is particularly pronounced for the SVM classifier, where performance drops by 10 to almost 20 points F1 when ommitting these features. The NN classifier's performance does not deteriorate as strongly under the same condition, indicating that some syntactic information can also be retrieved from contextualized word embeddings (see e.g., Tenney et al., 2019). Nonetheless, in most experiments, adding token features improves performance slightly, showing that they do contain important information for propagation decisions.

Propagating Conjunction Links in Automatic Parsing Setting
For the scenario of parsing from raw tokens, we compare two state-of-the-art parsers, StanfordNLP (Qi et al., 2018) and UDify (Kondratyuk and Straka, 2019), combined with the rule-based converter or ML-based conjunction propagators, and our graphparser based edge predictor. The latter is trained on the subset of training sentences that either do not contain coordinated verb phrases or that were corrected by us. Hyperparameters and training settings are given in Appendix E.
Results for these experiments can be found in Table 5. The impact of the quality of the parsed basic dependencies is evident: Results are much better for the UDify parser (LAS F1 of 89.4 for basic dependencies on the EWT dev set) than for  StanfordNLP (LAS F1 of 87.4). In the automatic setting, our heuristic extensions improve results compared to using the original converter, and there is no decrease in F1 on dev. As in the gold standard settings, ML-based extensions improve upon RBC on test, but not dev. Of the systems based on basiclayer tree parsers, RBC2 works best. However, performance of all pipeline systems show rather poor performance at or below an F1 of 70. Our graph-parser based edge predictor achieves by far the best results, outperforming all other models by a margin of over 7 points F1. This shows that in an automatic setting, most robust results are achieved by directly inducing dependency links between tokens, modeling conjunction only indirectly.
To estimate the impact of our corrections to the gold standard, we also train the graph parser on uncorrected data. The model trained on the corrected data has higher recall, but lower precision. This is expected to some extent as we introduce semantically motivated propagations of adjuncts, and we suspect that they may require a larger training set.

Discussion
The main insights comparing our experiments in the gold standard vs. the automatic parsing setting are as follows. Overall, our heuristic extensions for the rule-based converter are beneficial in both settings. In the gold setting, ML-based extensions lead to higher accuracy; when applying them on noisy parser output, they do not work well. However, using one end-to-end machine-learning model  directly to generate enhanced representations for conjunctions outperforms the pipeline version. A possible reason for this might be that these models were all developed on gold data, while the graphbased parser does not rely on potentially wrong structural tree features and is also able to use internal confidence information for edges. Another advantage of the end-to-end model may stem from the fact that its training allows to leverage semantic information from training data of a larger number of dependency links, i.e., including those not occurring in coordinate constructions. This points to a promising future research direction, i.e, generating additional semi-artificial training data for conjunction propagation.

Conclusion and Outlook
We have presented a large-scale manually curated dataset for conjunction propagation in English. In contrast to previous work focusing on highprecision rule-based propagation, we propagate links in all cases that semantically suggest argument or adjunct sharing. In the gold standard treebank enhancing setting, we found ML-based models to outperform the de-facto standard rule-based converter by learning to exploit mostly structural features. However, one of our main insights is that neither rule-based nor ML-based classifiers work well on noisy parser output precisely because of this reliance on structural information. We propose to use a graph-parser based edge predictor instead and show that it outperforms pipeline-based models by a large margin. Our model reaches F1 scores between 0.75 and 0.78 with a precision of more than 0.82, a level of performance that may already be useful in downstream tasks.
Our models could be used for creating highquality enhanced-level representations of conjunctions for the remaining English data, and could thus help in a UD community effort to continuously improve the UD treebanks. Future work also includes the study of conjunction propagation methods for further languages. Our in-depth study on English data provides several insights that we expect to be transferable cross-linguistically. First, conjunction propagation can to some extent be addressed using heuristic rules, but capturing the full semantic nature of the task requires manual annotation. Second, given appropriate training data, our machine-learning based approaches are also applicable to other languages.
In addition, it would be interesting to see if manually annotated data for coordinate constructions may be useful in natural language understanding tasks such as natural language inference (NLI). This is especially true for "stress test" datasets such as CONJNLI (Saha et al., 2020), which are designed to specifically test models' capabilities to process coordination.
Finally, as morphological features are generally important for this task, improving their automatic prediction (see e.g., Ramm et al., 2017;Myers and Palmer, 2019) as well as UD's gold standard seems to be a promising way to go. Our work has demonstrated the value of a linguistically motivated corpus study of a syntactic-semantic phenomenon, and shown that given manually curated data, rules for conjunction propagation can be learned effectively.  . Formally, the set E l A is the set of enhancedlayer edges that are (i) not present in the basic layer and (ii) involved in conjunctions as incoming or outgoing links of the conjuncts, with label l marked by annotator A. We also count the overlap of links for pairs of annotators. Using these counts, we then compute precision, recall and F1, treating one annotator as the system and one as the gold standard. For instance, when treating A as the gold standard and B as the system, this leads to: Note that when reversing this order, P and R are simply reversed, F1 stays the same.
The following numbers compare each annotator to the original gold standard (not in tables). For modifier clauses (acl, advcl) and adverbials (advmod), B was the most aggressive in propagating dependencies, adding 55 links in total for these labels, while A and C only added 39 and 32 links, respectively. While all annotators propagated obl dependencies roughly to the same extent, agreement was high between A and B but lower (F1 64-68%) between C and the others, indicating that there are more ambiguities among these dependencies as well. Annotator C is generally more conservative in propagating dependencies. This is reflected in the relatively low recall when comparing to the other annotators, as well as the lower overall number of added links (285 as compared to 309 for A and 312 for B).   Table 7: Agreement of Annotator A vs. Annotator C on links involved in coordinate constructions in the enhanced layer. For P/R computation, A was treated as the gold standard and C as the system.  Table 8: Agreement of Annotator B vs. Annotator C on links involved in coordinate constructions in the enhanced layer. For P/R computation, B was treated as the gold standard and C as the system. Table 9: Statistics of modifications made to 1,417 sentences of the EWT, including both basic and enhanced layer. #sents reports the number of sentences in which the respective reported changes were made, #total reports the number of occurrences of the label in the original treebank. Table 10 lists the features used in our SVM and NN models. Token features are extracted for conjunction head, conjunction dependent, and propagation target each. In addition to the listed features, we also experimented with including lemmas and POS tags, but did not find them to be useful in our ablation experiments.

D ML-based classifiers: features
E Graph-based edge predictor: Training Setup Label lexicalization. At training time, we only use a limited label set of 56 labels where lexical material is replaced with placeholders, such as obl: [case]. At prediction time, we retrieve the missing lexical material from the dependency graph in a rule-based fashion. In the simplest case, this means simply substituting the word form of the dependent of the required type (e.g., a case relation). In conjunctions, the token in question may not have its own dependent of the correct type, instead "inheriting" if from its conjunction head. In that case, we retrieve the lexical material from the conjunction head's dependent.
Hyperparameters We perform only a minimal amount of hyperparameter tuning, mostly sticking with the values used by Kondratyuk and Straka (2019). One notable exception is the training regime, where we found low batch size and the AdamW optimizer to yield the best results. The full hyperparameter configuration can be found in Table 11.