A High Precision Pipeline for Financial Knowledge Graph Construction

Sarah Elhammadi, Laks V.S. Lakshmanan, Raymond Ng, Michael Simpson, Baoxing Huai, Zhefeng Wang, Lanjun Wang


Abstract
Motivated by applications such as question answering, fact checking, and data integration, there is significant interest in constructing knowledge graphs by extracting information from unstructured information sources, particularly text documents. Knowledge graphs have emerged as a standard for structured knowledge representation, whereby entities and their inter-relations are represented and conveniently stored as (subject,predicate,object) triples in a graph that can be used to power various downstream applications. The proliferation of financial news sources reporting on companies, markets, currencies, and stocks presents an opportunity for extracting valuable knowledge about this crucial domain. In this paper, we focus on constructing a knowledge graph automatically by information extraction from a large corpus of financial news articles. For that purpose, we develop a high precision knowledge extraction pipeline tailored for the financial domain. This pipeline combines multiple information extraction techniques with a financial dictionary that we built, all working together to produce over 342,000 compact extractions from over 288,000 financial news articles, with a precision of 78% at the top-100 extractions. The extracted triples are stored in a knowledge graph making them readily available for use in downstream applications.
Anthology ID:
2020.coling-main.84
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
967–977
Language:
URL:
https://aclanthology.org/2020.coling-main.84
DOI:
10.18653/v1/2020.coling-main.84
Bibkey:
Cite (ACL):
Sarah Elhammadi, Laks V.S. Lakshmanan, Raymond Ng, Michael Simpson, Baoxing Huai, Zhefeng Wang, and Lanjun Wang. 2020. A High Precision Pipeline for Financial Knowledge Graph Construction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 967–977, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
A High Precision Pipeline for Financial Knowledge Graph Construction (Elhammadi et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.84.pdf