Shortest-Path Graph Kernels for Document Similarity

Giannis Nikolentzos, Polykarpos Meladianos, François Rousseau, Yannis Stavrakas, Michalis Vazirgiannis


Abstract
In this paper, we present a novel document similarity measure based on the definition of a graph kernel between pairs of documents. The proposed measure takes into account both the terms contained in the documents and the relationships between them. By representing each document as a graph-of-words, we are able to model these relationships and then determine how similar two documents are by using a modified shortest-path graph kernel. We evaluate our approach on two tasks and compare it against several baseline approaches using various performance metrics such as DET curves and macro-average F1-score. Experimental results on a range of datasets showed that our proposed approach outperforms traditional techniques and is capable of measuring more accurately the similarity between two documents.
Anthology ID:
D17-1202
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1890–1900
Language:
URL:
https://aclanthology.org/D17-1202
DOI:
10.18653/v1/D17-1202
Bibkey:
Cite (ACL):
Giannis Nikolentzos, Polykarpos Meladianos, François Rousseau, Yannis Stavrakas, and Michalis Vazirgiannis. 2017. Shortest-Path Graph Kernels for Document Similarity. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1890–1900, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Shortest-Path Graph Kernels for Document Similarity (Nikolentzos et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1202.pdf
Data
WebKB