Detecting Paraphrases of Standard Clause Titles in Insurance Contracts
Frieda Josi | Christian Wartena | Ulrich Heid
RELATIONS - Workshop on meaning relations between phrases and sentences

For the analysis of contract texts, validated model texts, such as model clauses, can be used to identify reused contract clauses. This paper investigates how to calculate the similarity between titles of model clauses and headings extracted from contracts, and which similarity measure is most suitable for this. For the calculation of the similarities between title pairs we tested various variants of string similarity and token based similarity. We also compare two more semantic similarity measures based on word embeddings using pretrained embeddings and word embeddings trained on contract texts. The identification of the model clause title can be used as a starting point for the mapping of clauses found in contracts to verified clauses.