Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications

Mark-Christoph Mueller


Abstract
We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities.
Anthology ID:
W19-0804
Volume:
RELATIONS - Workshop on meaning relations between phrases and sentences
Month:
May
Year:
2019
Address:
Gothenburg, Sweden
Editors:
Venelin Kovatchev, Darina Gold, Torsten Zesch
Venue:
IWCS
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/W19-0804
DOI:
10.18653/v1/W19-0804
Bibkey:
Cite (ACL):
Mark-Christoph Mueller. 2019. Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications. In RELATIONS - Workshop on meaning relations between phrases and sentences, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications (Mueller, IWCS 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-0804.pdf