Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents

Paul Molloy, Joeran Beel, Akiko Aizawa


Abstract
The relatedness of research articles, patents, court rulings, web pages, and other document types is often calculated with citation or hyperlink-based approaches like co-citation (proximity) analysis. The main limitation of citation-based approaches is that they cannot be used for documents that receive little or no citations. We propose Virtual Citation Proximity (VCP), a Siamese Neural Network architecture, which combines the advantages of co-citation proximity analysis (diverse notions of relatedness / high recommendation performance), with the advantage of content-based filtering (high coverage). VCP is trained on a corpus of documents with textual features, and with real citation proximity as ground truth. VCP then predicts for any two documents, based on their title and abstract, in what proximity the two documents would be co-cited, if they were indeed co-cited. The prediction can be used in the same way as real citation proximity to calculate document relatedness, even for uncited documents. In our evaluation with 2 million co-citations from Wikipedia articles, VCP achieves an MAE of 0.0055, i.e. an improvement of 20% over the baseline, though the learning curve suggests that more work is needed.
Anthology ID:
2020.wosp-1.1
Volume:
Proceedings of the 8th International Workshop on Mining Scientific Publications
Month:
05 August
Year:
2020
Address:
Wuhan, China
Editors:
Petr Knoth, Christopher Stahl, Bikash Gyawali, David Pride, Suchetha N. Kunnath, Drahomira Herrmannova
Venue:
WOSP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2020.wosp-1.1
DOI:
Bibkey:
Cite (ACL):
Paul Molloy, Joeran Beel, and Akiko Aizawa. 2020. Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents. In Proceedings of the 8th International Workshop on Mining Scientific Publications, pages 1–8, Wuhan, China. Association for Computational Linguistics.
Cite (Informal):
Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents (Molloy et al., WOSP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wosp-1.1.pdf