RELATIONS - Workshop on meaning relations between phrases and sentences
Venelin Kovatchev, Darina Gold, Torsten Zesch (Editors)
- Anthology ID:
- W19-08
- Month:
- May
- Year:
- 2019
- Address:
- Gothenburg, Sweden
- Venue:
- IWCS
- SIG:
- SIGSEM
- Publisher:
- Association for Computational Linguistics
- URL:
- https://aclanthology.org/W19-08/
- DOI:
- PDF:
- https://aclanthology.org/W19-08.pdf
RELATIONS - Workshop on meaning relations between phrases and sentences
Venelin Kovatchev
|
Darina Gold
|
Torsten Zesch
Assessing the Difficulty of Classifying ConceptNet Relations in a Multi-Label Classification Setting
Maria Becker
|
Michael Staniek
|
Vivi Nastase
|
Anette Frank
Commonsense knowledge relations are crucial for advanced NLU tasks. We examine the learnability of such relations as represented in ConceptNet, taking into account their specific properties, which can make relation classification difficult: a given concept pair can be linked by multiple relation types, and relations can have multi-word arguments of diverse semantic types. We explore a neural open world multi-label classification approach that focuses on the evaluation of classification accuracy for individual relations. Based on an in-depth study of the specific properties of the ConceptNet resource, we investigate the impact of different relation representations and model variations. Our analysis reveals that the complexity of argument types and relation ambiguity are the most important challenges to address. We design a customized evaluation method to address the incompleteness of the resource that can be expanded in future work.
Detecting Collocations Similarity via Logical-Linguistic Model
Nina Khairova
|
Svitlana Petrasova
|
Orken Mamyrbayev
|
Kuralay Mukhsina
Semantic similarity between collocations, along with words similarity, is one of the main issues of NLP, which must be addressed, in particular, in order to facilitate the automatic thesaurus generation. In the paper, we consider the logical-linguistic model that allows defining the relation of semantic similarity of collocations via the logical-algebraic equations. We provide the model for English, Ukrainian and Russian text corpora. The implementation for each language is slightly different in the equations of the finite predicates algebra and used linguistic resources. As a dataset for our experiment, we use 5801 pairs of sentences of Microsoft Research Paraphrase Corpus for English and more than 1 000 texts of scientific papers for Russian and Ukrainian.
Detecting Paraphrases of Standard Clause Titles in Insurance Contracts
Frieda Josi
|
Christian Wartena
|
Ulrich Heid
For the analysis of contract texts, validated model texts, such as model clauses, can be used to identify reused contract clauses. This paper investigates how to calculate the similarity between titles of model clauses and headings extracted from contracts, and which similarity measure is most suitable for this. For the calculation of the similarities between title pairs we tested various variants of string similarity and token based similarity. We also compare two more semantic similarity measures based on word embeddings using pretrained embeddings and word embeddings trained on contract texts. The identification of the model clause title can be used as a starting point for the mapping of clauses found in contracts to verified clauses.
Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications
Mark-Christoph Mueller
We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities.