Scalable Knowledge Graph Construction from Text Collections

Ryan Clancy, Ihab F. Ilyas, Jimmy Lin


Abstract
We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.
Anthology ID:
D19-6607
Volume:
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, Arpit Mittal
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–46
Language:
URL:
https://aclanthology.org/D19-6607
DOI:
10.18653/v1/D19-6607
Bibkey:
Cite (ACL):
Ryan Clancy, Ihab F. Ilyas, and Jimmy Lin. 2019. Scalable Knowledge Graph Construction from Text Collections. In Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), pages 39–46, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Scalable Knowledge Graph Construction from Text Collections (Clancy et al., 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-6607.pdf