ConReader: Exploring Implicit Relations in Contracts for Contract Clause Extraction

We study automatic Contract Clause Extraction (CCE) by modeling implicit relations in legal contracts. Existing CCE methods mostly treat contracts as plain text, creating a substantial barrier to understanding contracts of high complexity. In this work, we first comprehensively analyze the complexity issues of contracts and distill out three implicit relations commonly found in contracts, namely, 1) Long-range Context Relation that captures the correlations of distant clauses; 2) Term-Definition Relation that captures the relation between important terms with their corresponding definitions, and 3) Similar Clause Relation that captures the similarities between clauses of the same type. Then we propose a novel framework ConReader to exploit the above three relations for better contract understanding and improving CCE. Experimental results show that ConReader makes the prediction more interpretable and achieves new state-of-the-art on two CCE tasks in both conventional and zero-shot settings.


Introduction
Legal Contract Review is a process of thoroughly examining a legal contract before it is signed to ensure that the content stated in the contract is clear, accurate, complete and free from risks.A key component to this application is the Contract Clause Extraction (CCE), which aims to identify key clauses from the contract for further in-depth review and risk assessment.Typically, CCE consists of two major tasks targeting different query granularities for real-life usages.They are Clause Analysis (CA) and Clause Discovery (CD) 2 , where CA aims to identify clauses that belong to a general clause type, while CD aims to identify clauses similar to a specific clause (depicted in Figure 1).CCE is both expensive and time-consuming as it requires legal professionals to manually identify a small number of key clauses from contracts with hundreds of pages in length (Hendrycks et al., 2021).Therefore, there is a pressing need for automating CCE, which assists legal professionals to analyze long and tedious documents and provides non-professionals with immediate legal guidance.
The biggest challenge to automating CCE is the complexities of contracts.In the literature, simply treating contracts as plain text, most pretrained language models perform poorly on CCE (Devlin et al., 2019;Liu et al., 2019).Some works try to simplify CCE from the perspective of contract structure.For example, Chalkidis et al. (2017) assign a fixed extraction zone for each clause type and limit the clauses to be extracted only from their corresponding extraction zones.Hegel et al. (2021) use visual cues of document layout and placement as additional features to understand contracts.However, their local context assumption is not flexible and, more seriously, neglects more complicated relations inherent in the contracts.
In fact, as shown in Figure 1, contracts are formal documents that typically follow a semistructured organization.The body of a contract is usually organized into some predefined articles such as "Definitions" and "Terminations", where relevant clauses are orderly described inside.Different articles may hold different levels of importance.For example, the "Definitions" article is globally important because it clearly defines all important terms that would be frequently referenced, while other articles are sparsely correlated, holding local importance.We attempt to decompose the complexities into a set of implicit relations, which can be exploited to better understand contracts.Therefore, as shown in Figure 1, we identify This Agreement shall commence on the Commencement Date and shall continue for a term of ten (10) years, unless previously terminated in accordance with Clause 15.

CO-PROMOTION AGREEMENT
This Co-Promotion Agreement (this "Agreement") is entered into and dated as of September 26, 2018 by and between Dova Pharmaceuticals, Inc., a Delaware corporation ("Dova"), and Valeant Pharmaceuticals North America LLC, a Delaware limited liability company ("Valeant").… ARTICLE 1 DEFINITIONS 1.1 "Product Materials" shall have the meaning set forth in Section 4.4.1(a).… 1.47 "Dova Trademarks and Copyrights" shall mean the logos, trade dress, slogans, domain names and housemarks of Dova or any of its Affiliates as may appear on any Product Materials.three implicit relations to directly tackle the complexities from three aspects: 1) The implicit logical structure among distant text: This is originated from the fact that a clause from one article may refer to clauses from distant articles.However, most pretrained language models (e.g.BERT) inevitably break the correlations among clauses because they have to split a contract into multiple segments for separate encoding due to the length limitation.Therefore, we define a Long-range Context Relation (LCR) to capture the relations between different segments to keep the correlations among clauses.
2) The unclear legal terms: Legal terms need to be clearly and precisely declared to minimize ambiguity.Thanks to the "Definition" article, we can easily find the meaning of a particular term.Then the relation between each term and its definition is defined as Term-Definition Relation (TDR).The clarity of TDR allows consistent information flow by enhancing terms with semantics-rich definitions; 3) The ambiguity among clauses: It is usually hard to differentiate different types of clauses just from their text formats.For example, clauses of type "Expiration Date" and "Agreement Date" both show up as dates.It leads to the third relation defined as Similar Clause Relation (SCR).SCR captures the similarity of the same type of clauses across contracts.It enhances a clause's semantics with its unique type information and thus maintains the discrimination among different clause types.Furthermore, LCR and TDR are two intra-contract relations while SCR is an inter-contract relation.
In light of the above investigations about the complexities of contracts, we propose a novel framework, ConReader, to tackle two CCE tasks by exploiting the above three relations for better contract understanding.Concretely, we reserve a small number of token slots in the input segments for later storage of the three kinds of relational information.To prepare intra-contract relations, including LCR and TDR, we get the segment and definition representations from pretrained language models.Regarding the inter-contract relation, i.e.SCR, since the size of SCR increases as the number of contracts increases, we are unable to enumerate all possible SCRs.Therefore, we enable input segments to interact with a Clause Memory that stores recently visited clauses, where a clause retriever is adopted to retrieve similar clauses from the Clause Memory.Then, we enrich each segment by filling the reserved slots with context segments, relevant definitions, as well as retrieved similar clauses.Finally, a fusion layer is employed to simultaneously learn relevant information both from the local (i.e.within the segment) or global context (i.e. via implicit relations) for extracting the target clause.
To summarize, our main contributions are threefold: • This work targets automatic CCE.We comprehensively analyze the complexity issues of modeling legal contracts and distill out three implicit relations, which have hardly been discussed before.

Framework
Overview We describe the problem definition for CCE via extractive Question Answering (QA) (Rajpurkar et al., 2016).Let {c m } M m=1 be a contract in the form of multiple segments and q be a query either represented as a clause type in the CA task or a specific clause in the CD task.Our goal is to extract clauses {y k } K k=1 corresponding to the query.There may be multiple or no correct clauses and each clause is a text span in a particular segment denoted by its start and end index if existent.
Figure 2 depicts the overview of ConReader, which consists of four main components: • LCR Solver tackles LCR by encoding the wrapped segments {x m } M m=1 aware of the query q and the reserved slots r into hidden states {h lcr m } M m=1 , where the overall segment representations are stored in a segment bucket B lcr .
• TDR Solver tackles TDR by encoding all definitions {d n } N n=1 from the contract into hidden states {h tdr n } N n=1 , where the overall definition representations are stored in a definition bucket B tdr .
• SCR Solver tackles SCR by retrieving similar clause representations { h scr m } M m=1 from a Clause Memory M according to a similarity function f (•, •) between the segment and the stored clause.
• Aggregator enriches each segment representation with the three relational information for extracting the target clause.

Long-range Context Relation Solver
The goal of LCR Solver is to output all segment representations in a contract in the face of the length limitation of pretrained language models.Meanwhile, to allow a flexible relation modeling in later Aggregator, we reserve some token slots for later storage of relational information before encoding.Specifically, we concatenate each segment with the query and the reserved token slots to form the input sequence within the length limitation:   only take a small portion of the entire sequence (|r| << 512) such that they only slightly affect the efficiency.It does not matter which token is chosen as the placeholder since we would directly mask these slots such that they will not affect the hidden states of query and segment tokens as well as not receive gradient for update.

Shared Encoder
Then, we apply a RoBERTa encoder Enc(•) to get the hidden states for all input sequences: h lcr m = Enc(x m ), where h lcr m ∈ R |xm|×h , and h is the hidden dimension.To reflect the order of different segments in a contract, we also add a segment positional embedding (Vaswani et al., 2017) to the hidden state h lcr m,cls at [CLS] to get the segment representation for each input segment: where Pos(•) is a standard RoBERTa positional encoder.All segment representations are temporarily stored in a segment bucket B lcr = { h lcr m } M m=1 .

Term-Definition Relation Solver
TDR Solver is responsible for providing the specific definitions for terms that may raise ambiguity.
Algorithm 1: SCR Solver (training) l = 1, ..., L; 2 Get hidden states of segments {h lcr m } M m=1 from Section 2.1 using q and {cm} M m=1 ; 3 Get clause type lq according to the query q; 4 // retrieve clauses; 5 for segment m = 1, 2, . . ., M do 6 Retrieve a similar clause h scr m for each segment via Equation (5); It can be observed in Figure 1 that definitions are well organized in the "Definition" article.Therefore, we use regular expressions including some keywords like "shall mean", "mean" to automatically extract those definitions.Then, we prepare the definition inputs as : where each definition is presented in the form of key-value pair.Each key k n denotes a legal term in the contract and the value v n denotes its corresponding definition text.Then we apply the same RoBERTa encoder to encode these definitions into hidden states h tdr n , where the hidden states h tdr n,cls at [CLS] are denoted as definition representations { h tdr n } N n=1 , which are temporarily stored in another definition bucket B tdr .

Similar Clause Relation Solver
Since SCR is an inter-contract relation, we are unlikely to enumerate all possible clause pairs.Therefore, we maintain a Clause Memory M to: (1) dynamically store clauses of all types; and (2) allow input segments to retrieve similar clauses according to a similarity function f (•, •).Details can be found in Algorithm 1.
Dynamic Update of M During training, we assume each query q implies a particular clause type l q (the query of CA itself is a clause type, while the query of CD belongs to a clause type), where we have L clause types in total.Initially, M allocates the same memory space of size |M| for each clause type to store the corresponding clause representations.Suppose that we get h lcr m from LCR Solver for x m and there is a clause y of type l q corresponding to the given query q inside x m .We denote its clause representation h y as the concatenation of its start and end token representations: where [• : •] denotes vector concatenation, and s and e are the start and end index of y inside x m .When encountering such clause, we add h y to its corresponding memory partition M[l q ].If the memory partition is full, we follow the first-in firstout (FIFO) principle to remove the earliest clause representation stored in M[l q ] to make room for the new one, such that the clause representations stored are always up-to-date.
Retrieve Clauses from M When asking to identify clause of type l q , we allow each input segment to retrieve a similar clause from the Clause Memory.The retrieved clause would imply the semantic and contextual information of this type of clauses in other contracts, facilitating the extraction of the same type of clauses in the current contract.Specifically, given the hidden states of the input sequence h lcr m with a query q of type l q as well as the Clause Memory M, we limit the retrieval process only in the corresponding memory partition M[l q ] during training to retrieve truly similar (i.e. of the same type) clauses that provide precise guidance on clause extraction in the current contract.The retriever is implemented as a similarity function f (•, •): where f (h lcr m,cls , h y ) = cos (h lcr m,cls W lcr , h y W y ), W lcr ∈ R h×h and W y ∈ R 2h×h are parameters to project h lcr m,cls , h y to the same space.To make the retriever trainable such that it can learn to capture the common characteristics of the same type of clauses, we introduce a Retrieval Loss L r to minimize a contrastive learning loss function (Hadsell et al., 2006), where a negative clause h y− ∈ M \ M[l q ] is randomly sampled:

Aggregator
After obtaining relational information from corresponding relation solvers, we fill all these representations into the reserved token slots and allow the new segment sequence to automatically learn three implicit relations via a fusion layer.
For LCR and TDR, not all segment or definition representations in the corresponding buckets are necessary for each input segment as they may be repeated (i.e.LCR) or out of segment scope (i.e.TDR).Therefore, for the m-th input segment, we remove the repeated segment representation (i.e.h scr m ) and only consider the definition representations whose terms appear in this segment: For SCR, each segment is paired with one clause representation retrieved.Then after filling all corresponding representations into the reserved slots, we get the final hidden state h m for each segment: where h lcr m,cls:sep2 are the hidden states ranging from [CLS] to the second [SEP] in h lcr m .Note that we do not set a specific size of reserved slots for each relation, but only assure that the total size should not exceed |r|.The reserved slots taken by these representations are unmasked to enable calculation and gradient flow.Then h m would pass a fusion layer to automatically learn the three implicit relations: where Fusion(•) is a standard RoBERTa layer with randomly initialized parameters and o m is the relation-aware hidden states for the m-th segment.
We use o m to extract clause: where P s (m) and P e (m) denote the probabilities of a token being the start and end positions respectively.W s , W e ∈ R h×1 are corresponding parameters.The Extraction Loss L e is defined as the crossentropy between the predict probabilities and the ground-truth start and end positions respectively.

Training & Prediction
Training During training, we assume that the clause type for each input query is available and follow ConReader to get L r and L e , where the final training objective is the summation of them L = L r + L e .If no clauses can be extracted given the current query, we set both the start and end positions to 0 (i.e. [CLS]).
Prediction At the prediction time, we may encounter zero-shot scenarios where the clause types are out-of-scope of the existing L types and, more seriously, CD essentially does not provide the clause type for each query clause.This would stop ConReader from generalizing to these scenarios as we are unable to indicate which memory partition of M for retrieval.To address this limitation, we allow the retrieval to be performed in the entire clause memory ( the condition in Equation 5would be replaced to h y ∈ M) since the retriever has already learned to effectively capture the common characteristics of similar clauses.To deal with the extraction of multiple clauses, we follow Hendrycks et al. (2021) to output top T clauses according to P s (m) i × P e (m) j in the contract, where 3 Experimental Settings For CA, we use the training set of CUAD to train a ConReader model.We evaluate it on the test set of CUAD for the conventional setting and on the development and test sets of Contract Discovery for the zero-shot setting.For CD, since we now have a training set from CUAD, we apply the same supervised extractive QA setting, where one clause is supposed to be extracted conditioned on the query clause instead of original unsupervised sentence matching formulation.Similar to Borchmann et al. ( 2020), we sub-sample k (k = 5 in our work) clauses for each clause type and split them into k -1 seed clauses and 1 target clause.Then, we pair each of the seed clauses with the contract containing the target clause to form k -1 CD examples.By repeating the above process, we can finally get the CD datasets for both training and evaluation.Similar to CA, we train another model for CD and evaluate it in two settings.Details of data statistics can be found in Appendix A.1.
Evaluation Metrics Following Hendrycks et al. (2021), we use Area Under the Precision-Recall curve (AUPR) and Precision at 80% Recall (P@0.8R) as the major evaluation metrics for CA.In CUAD, an extracted clause is regarded as true positive if the Jaccard similarity coefficient between the clause and the ground truth meets a threshold of 0.5 (Hendrycks et al., 2021).While in Contract Discovery, it tends to annotate longer clauses with some partially related sentences (examples can be found in Appendix A.2). Therefore, we also regard an extracted clause as true positive if it is a sub-string of the ground truth.For CD, we use AUPR and Soft-F1 to conduct a more finegrained evaluation in terms of words (Borchmann et al., 2020).

Implementation Details
16-heads, 355M parameters) from Huggingface3 .The reserved slots size |r| is set to 30 such that most of the relational information can be filled in.
The size of Clause Memory |M| for each partition is 10.In prediction, we follow Hendrycks et al. (2021) to output top T = 20 clauses.Recall that the query of CD is a clause, which is much longer than a clause type.We set the max query length for CA and CD to be 64 and 256 respectively.The max sequence length is 512 for both models in two tasks.We follow the default learning rate schedule and dropout settings used in RoBERTa.
We use AdamW (Loshchilov and Hutter, 2019) as our optimizer.We use grid search to find optimal hyper-parameters, where the learning rate is chosen from {1e-5,5e-5,1e-4}, the batch size is chosen from {6,8,12,16}.We additionally introduce 1.7M and 7M parameters to implement the clause retriever f (•, •) and fusion layer Fusion in ConReader.Comparing to RoBERTa, their sizes are almost negligible, and hardly affect the speed.All experiments are conducted on one Titan RTX card.

Results
Conventional Setting Table 1 shows the results of CA and CD in the conventional setting.Among base-size models, ConReader-base significantly improves over all previous methods on both tasks, where it surpasses the RoBERTa-base by 4.0 and
3.9 AUPR respectively.Among large-size models, ConReader-large can exceed RoBERTa-large by 1.7 AUPR and 5.3 P@0.8R on CA and achieves the new state-of-the-art.Such a large improvement on P@0.8R would make the model less likely to miss important clauses that may cause huge losses, which is especially beneficial in the legal domain.
Notably, ConReader-large also exceeds DeBERTaxlarge by 1.3 AUPR with less than half of its parameters (364M vs 750M), demonstrating the effectiveness of our framework.Additionally, there are several notable observations: 1) As the queries in CD are clauses, they are more diverse than the 41 queries of CA, making it a more difficult CCE task.2) We find that ConReader-base outperforms RoBERTa+PT-base.This implies that explicitly modeling the complexities of the contracts is more valuable than learning from the in-domain data in an unsupervised manner.
3) The improvements of the models designed for long text (Longformer and Hi-Transformer) are less significant than ConReader.It suggests that there are more sophisticated issues in contracts other than long text.In addition, Longformer favors Precision than Recall, causing P@0.8R to be 0 in CA and low performance in CD.Such a characteristic is not suitable for CCE as it has lower tolerance to miss important clauses.

Zero-shot Setting
In Table 2, we show the results of CCE in the zero-shot setting, where users may look beyond the 41 types of clauses annotated in Hendrycks et al. (2021) for their particular purposes.We can observe that: 1) All models suffer from a great performance drop in both tasks due to the label discrepancy between training and evaluation, which highlights the challenge of CCE in the zero-shot setting.2) Though Longformer-base performs well in the conventional setting, it is less competitive against RoBERTa-base in the zero-shot setting.We conjecture that it sacrifices the attention complexity for encoding longer text, which  Analysis of SCR Solver To examine in depth the effect of SCR Solver, we implement several variants from the perspectives of gathering similar clauses (Access) and maintaining the Clause Memory (Update).As shown in Table 5, for Access, we evaluate two variants by randomly selecting a clause representation from the corresponding memory partition (w/ Random M[l q ]) or retrieving the most similar one from the entire memory (w/ Retrieved M ).Since the first variant selects a truly positive example (of the same type) to train the Retrieval Loss, the performance only drops marginally comparing to our default design.While the second variant is less effective since it cannot guarantee the retrieval of a positive example, which imposes a distracting signal in the Retrieval Loss.For Update, we replace our FIFO update strategy with random update (w/ Random Update) or stopping update when memory is full (w/o Update).The first variant can also partially keep the clause representations update, while the second variant cannot, causing it to be less effective due to poor clause representations.Overall, our default design for SCR Solver is more effective than those variants.
Case Study Figure 3 shows the attention distribution of the start and end tokens of the ground-  truth clause over the reserved slots.It provides the interpretability that ConReader can precisely capture the relevant relations with high attention probability.For example, it indicates that there is an important cue ("Section 5.3") in the No.7 segment.It provides the detailed explanation of relevant terms ("Software Support and Maintenance" and "SOFTWARE") that mentioned in this clause.
In addition, the start and end tokens also exhibit high correlations with corresponding SCR start and end representations, showing that similar clauses can help determine the exact clause location.training data on both CA and CD.These results shall demonstrate the great value of ConReader in maintaining comparable performance and saving annotation costs at the same time.Meanwhile, the performance trends of the two tasks indicate that there is still a lot of room for improvement, suggesting that the current bottleneck is the lack of training data.According to the above analysis, we do believe that applying ConReader can still achieve stronger results than textual-input baselines (e.g.RoBERTa) when more data is available and therefore, reduce more workload of the end users.

Related Work
Contract Review Earlier works start from classifying lines of contracts into predefined labels, where handcrafted rules and simple machine learning methods are adopted (Curtotti and McCreath, 2010).Then, some works take further steps to analyze contracts in a fine granularity, where a small set of contract elements are supposed to be extracted, including named entities (Chalkidis et al., 2017), parties' rights and obligations (Funaki et al., 2020), and red-flag sentences (Leivaditi et al., 2020).They release corpora for automatic contract review, allowing neural models to get surprising performance (Chalkidis and Androutsopoulos, 2017;Chalkidis et al., 2019).Recently, studies grow increasing attention on CCE to extract clauses, which are complete units in contracts, and carefully select a large number of clause types worth human attention (Borchmann et al., 2020;Wang et al., 2021b;Hendrycks et al., 2021).Due to the repetition of contract language that new contracts usually follow the template of old contracts (Simonson et al., 2019), existing methods tend to incorporate structure information to tackle CCE.For example, Chalkidis et al. (2017) assign a fixed extraction zone for each clause type and limit the clauses to be extracted from corresponding extraction zones.Hegel et al. (2021) leverage visual cues such as document layout and placement as additional features to better understand contracts.
Retrieval & Memory Retrieval from a global memory has shown promising improvements to a variety of NLP tasks as it can provide extra or similar knowledge.One intuitive application is the open-domain QA, where it intrinsically necessitates retrieving relevant knowledge from outer sources since there is no supporting information at hand (Chen et al., 2017;Karpukhin et al., 2020;Xu et al., 2021a,b).Another major application is neural machine translation with translation memory, where the memory can either be the bilingual training corpus (Feng et al., 2017;Gu et al., 2018) or a large collection of monolingual corpus (Cai et al., 2021).It also has received great attention in other text generation tasks including dialogue response generation (Cai et al., 2019;Li et al., 2021) and knowledge-intensive generation (Lewis et al., 2020), as well as some information extraction tasks including named entity recognition (Wang et al., 2021a), and relation extraction (Zhang et al., 2021).

Conclusion
We tackle Contract Clause Extraction by exploring three implicit relations in contracts.We comprehensively analyze the complexities of contracts and distill out three implicit relations.Then we propose a framework ConReader to effectively exploit these relations for solving CCE in complex contracts.Extensive Experiments show that ConReader makes considerable improvements over existing methods on two CCE tasks in both conventional and zeroshot settings.Moreover, our analysis towards interpretability also demonstrates that ConReader is capable of identifying the supporting knowledge that aids in clause extraction.

Limitations
In this section, we discuss the limitations of this work as follows: • In this paper, we employ some languagedependent methods to extract the definitions.Specifically, we use some regular expressions to extract definitions from English contracts in the TDR solver due to the well-organized structure of contracts.Therefore, some simple extraction methods have to be designed to tackle the definition extraction when applying our framework to legal contracts in other languages.
• In order to meet the need of the end users, there is much room for improvement of the CCE models.Due to the limited training data from CUAD (408 contracts), it would be difficult to train a robust model that can be directly used in real-life applications, especially those requiring the zero-shot transfer capability.Therefore, it would be beneficial to collect more training data in order to satisfy the industrial requirements.In addition, the low-resource setting is also a promising and practical direction for future studies.

Ethics Statement
The main purpose of CCE is to reduce the tedious search effort of legal professionals from finding needles in a haystack.It only serves to highlight potential clauses for human attention and the legal professionals still need to check the quality of those clauses before continuing to the final contract review (still human work).In fact, we use P@0.8R as one of our evaluation metrics because it is quite strict and meets the need of legal professionals.
We also conduct a zero-shot setting experiment to demonstrate that the benefit of ConReader is not learning from biased information and has a good generalization ability.We use publicly available CCE corpora to train and evaluate our ConReader.The parties in these contracts are mostly companies, which do not involve gender or race issues.Some confidential information has originally been redacted to protect the confidentiality of the parties involved.Such redaction may show up as asterisks (***) or underscores (___) or blank spaces.We make identify and annotate all definitions in those contracts.Such definitions are well structured, which require little legal knowledge.These annotations are just to verify the effectiveness of TDR Solver in ConReader but not to contribute a new dataset.We can release the annotated definitions for the reproduction of our analysis if necessary.We report all preprocessing procedures, hyper-parameters, evaluation schemes, and other technical details and will release our codes for reproduction (we move some to the Appendix due to the space limitation).

A.1 Data Statistics
We show the datasets statistics in Table 6.CUAD annotates 41 types of clauses that lawyers need to pay attention to when reviewing contracts.Some types are "Governing Law", "Agreement Date", "License Grant", and "Insurance" et al.Contract Discovery annotates another 21 types of clauses that must be well-understood by the legal annotators.These types include "Trustee Appointment", "Income Summary", and "Auditor Opinion" et al.The two datasets differ substantially in their annotated types, making Contract Discovery a good resource for conducting zero-shot experiments.To prepare a real zero-shot setting, we further remove 6 types of clauses annotated in both corpora to prepare a real zero-shot setting.The types include: change of control covenant, change of control notice, governing law, no solicitation, effective date reference, effective date main.
Since most contents in contracts are unlabeled, which cause a large imbalance between extractable and non-extractable segments.If a CCE model is trained on this imbalanced data, it is likely to output an empty span since it has been taught by the non-extractable segments not to extract clauses.Therefore, we follow Hendrycks et al. (2021) to downweight contract segments that do not contain any relevant clauses in the training set such that extractable and non-extractable segments are approximately balanced (i.e.1:1).While in test sets, we keep all non-extractable segments.This explains why test sets have fewer contracts but more segments.

A.2 Annotation Difference
Table 7 shows the annotation difference between CUAD and Contract Discovery on "Governing Law" clauses.In fact, Contract Discovery tends to annotate more facts into the clause, such as parties' obligations.Due to such annotation difference, we also regard an extracted clause as true positive in calculating AUPR if it is a sub-string of the ground truth in the zero-shot setting.

A.3 Performance by Type
Figure 5 shows the AUPR scores for each clause type of ConReader and RoBERTa.

CUAD
This Agreement shall be construed in accordance with and governed by the substantive internal laws of the State of New York.This Agreement shall be governed by the laws of the State of New York, without giving effect to its principles of conflicts of laws, other than Section 5-1401 of the New York General Obligations Law.This Agreement is subject to and shall be construed in accordance with the laws of the Commonwealth of Virginia with jurisdiction and venue in federal and Virginia courts in Alexandria and Arlington, Virginia.

Contract Discovery
Section 4.8 Choice of Law/Venue .This Agreement will be governed by and construed and enforced in accordance with the internal laws of the State of California, without giving effect to the conflict of laws principles thereof.Each Party hereby submits to personal jurisdiction before any court of proper subject matter jurisdiction located in Los Angeles, California, to enforce the terms of this Agreement and waives any and all objections to the jurisdiction and proper venue of such courts.This Agreement will be governed by and 4 construed in accordance with the laws of the State of Delaware (without giving effect to principles of conflicts of laws).Each Party: (a) irrevocably and unconditionally consents and submits to the jurisdiction of the state and federal courts located in the State of Delaware for purposes of any action, suit or proceeding arising out of or relating to this Agreement; Section 4.8.Choice of Law/Venue .This Agreement will be governed by and construed and enforced in accordance with the internal laws of the State of California, without giving effect to the conflict of laws principles thereof.Each Party hereby submits to personal jurisdiction before any court of proper subject matter jurisdiction located in Los Angeles, California, to enforce the terms of this Agreement and waives any and all objections to the jurisdiction and proper venue of such courts.
… 2.1 Dova Trademarks and Copyrights.2.1.1Valeant shall have the non-exclusive right to use the Dova Trademarks and Copyrights solely on Product Materials.… ARTICLE 12 TERMINATION 12.1 Termination.This Agreement shall become effective as of the Effective Date and, unless earlier terminated as provided in this ARTICLE 12, shall extend until the four (4) year anniversary of the Effective Date (the "Term").commence on March 15, 2018 and will expire on March 14, 2020 unless terminated earlier pursuant to Section 13 of the Agreement (the "Term") "Expiration Date" in other Contracts TDR LCR SCR Clause Analysis (CA) Input query: Highlight the clause related to "Expiration Date" Output clause: This Agreement shall become effective as of the Effective Date and, unless earlier terminated as provided in this ARTICLE 12, shall extend until the four (4) year anniversary of the Effective Date (the "Term").similar to "Customer hereby grants eGain a right to use Customer's trademarks designated by Customer for such limited uses, subject to Customer's logo usage guidelines."Output clause: Valeant shall have the non-exclusive right to use the Dova Trademarks and Copyrights solely on Product Materials.

Figure 1 :
Figure 1: An overview of the contract structure and CCE process.The left half illustrates three implicit relations widely found in contracts.The right half shows two tasks of CCE.

Figure 2 :
Figure 2: Overview of ConReader.Three solvers are used to obtain relevant information and an Aggregator is used to fuse all information into text representations for semantic enrichment.IR denotes the retrieval process.
Update clause memory; 9 for extractable clause k = 1, 2, . . ., K do 10 Get clause representation h y k via Equation (4); 11 if memory partition M[lq] is full then 12 Remove the earliest clause representation; h y k to M[lq]; 15 end

Figure 3 :
Figure 3: Case study of the attention distribution of a clause over its relevant information.

Table 1 :
Model Comparisons in the conventional setting.

Table 3 :
Ablation studies in the conventional setting.Definition statistics.F1@D denotes F1 on the definition level and Acc@C denotes the accuracy on the contract level.

Table 4 :
Analysis of TDR Solver.tions are shown in Table4.Specifically, more than half of the contracts contain definitions (290 / 408 for training, 65 / 102 for test), where our rule-based extraction can correctly extract definitions for most of them.In addition, the results in Table4(b) show our extracted definitions (+Auto) are capable of improving the ability of baseline models to extract clauses by enhancing the representations of legal terms and their benefits are almost the same as the ground-truth definitions (+Manual).

Table 5 :
AUPR on different variants of SCR Solver.

Table 6 :
Dataset statistics for CA and CD.

Table 7 :
Examples of annotation of "Governing Law" clauses in two datasets.