Zhixiong Zhang


2024

pdf bib
SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model
Dayong Wu | Jiaqi Li | Baoxin Wang | Honghong Zhao | Siyuan Xue | Yanjie Yang | Zhijun Chang | Rui Zhang | Li Qian | Bo Wang | Shijin Wang | Zhixiong Zhang | Guoping Hu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Large language models (LLMs) have shown remarkable achievements across various language tasks. To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Assistant (SparkRA) based on our SciLit-LLM. SparkRA is accessible online and provides three primary functions: literature investigation, paper reading, and academic writing. As of July 30, 2024, SparkRA has garnered over 50,000 registered users, with a total usage count exceeding 1.3 million.

2020

pdf bib
Representing and Reconstructing PhySH: Which Embedding Competent?
Xiaoli Chen | Zhixiong Zhang
Proceedings of the 8th International Workshop on Mining Scientific Publications

Recent advances in natural language processing make embedding representations dominate the computing language world. Though it is taken for granted, we actually have limited knowledge of how these embeddings perform in representing the complex hierarchy of domain scientific knowledge. In this paper, we conduct a comprehensive comparison of well-known embeddings’ capability in capturing the hierarchical Physics knowledge. Several key findings are: i, Poincaré embeddings do outperform if trained on PhySH taxonomy, but it fails if trained on co-occurrence pairs which are extracted from raw text. ii, No algorithm can properly learn hierarchies from the more realistic case of co-occurrence pairs, which contains more noisy relations other than hierarchical relations. iii, Our statistic analysis of Poincaré embedding’s representation of PhySH shows successful hierarchical representation share two characteristics: firstly, upper-level terms have a smaller semantic distance to root; secondly, upper-level hypernym-hyponym pairs should be further apart than lower-level hypernym-hyponym pairs.