SciConceptMiner: A system for large-scale scientific concept discovery

Zhihong Shen, Chieh-Han Wu, Li Ma, Chien-Pang Chen, Kuansan Wang


Abstract
Scientific knowledge is evolving at an unprecedented rate of speed, with new concepts constantly being introduced from millions of academic articles published every month. In this paper, we introduce a self-supervised end-to-end system, SciConceptMiner, for the automatic capture of emerging scientific concepts from both independent knowledge sources (semi-structured data) and academic publications (unstructured documents). First, we adopt a BERT-based sequence labeling model to predict candidate concept phrases with self-supervision data. Then, we incorporate rich Web content for synonym detection and concept selection via a web search API. This two-stage approach achieves highly accurate (94.7%) concept identification with more than 740K scientific concepts. These concepts are deployed in the Microsoft Academic production system and are the backbone for its semantic search capability.
Anthology ID:
2021.acl-demo.6
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
Month:
August
Year:
2021
Address:
Online
Editors:
Heng Ji, Jong C. Park, Rui Xia
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–54
Language:
URL:
https://aclanthology.org/2021.acl-demo.6
DOI:
10.18653/v1/2021.acl-demo.6
Bibkey:
Cite (ACL):
Zhihong Shen, Chieh-Han Wu, Li Ma, Chien-Pang Chen, and Kuansan Wang. 2021. SciConceptMiner: A system for large-scale scientific concept discovery. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 48–54, Online. Association for Computational Linguistics.
Cite (Informal):
SciConceptMiner: A system for large-scale scientific concept discovery (Shen et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-demo.6.pdf
Video:
 https://aclanthology.org/2021.acl-demo.6.mp4
Data
Microsoft Academic Graph