Honghong Zhao
2024
SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model
Dayong Wu
|
Jiaqi Li
|
Baoxin Wang
|
Honghong Zhao
|
Siyuan Xue
|
Yanjie Yang
|
Zhijun Chang
|
Rui Zhang
|
Li Qian
|
Bo Wang
|
Shijin Wang
|
Zhixiong Zhang
|
Guoping Hu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Large language models (LLMs) have shown remarkable achievements across various language tasks. To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Assistant (SparkRA) based on our SciLit-LLM. SparkRA is accessible online and provides three primary functions: literature investigation, paper reading, and academic writing. As of July 30, 2024, SparkRA has garnered over 50,000 registered users, with a total usage count exceeding 1.3 million.
2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover’s Distance
Jianquan Li
|
Xiaokang Liu
|
Honghong Zhao
|
Ruifeng Xu
|
Min Yang
|
Yaohong Jin
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Pre-trained language models (e.g., BERT) have achieved significant success in various natural language processing (NLP) tasks. However, high storage and computational costs obstruct pre-trained language models to be effectively deployed on resource-constrained devices. In this paper, we propose a novel BERT distillation method based on many-to-many layer mapping, which allows each intermediate student layer to learn from any intermediate teacher layers. In this way, our model can learn from different teacher layers adaptively for different NLP tasks. In addition, we leverage Earth Mover’s Distance (EMD) to compute the minimum cumulative cost that must be paid to transform knowledge from teacher network to student network. EMD enables effective matching for the many-to-many layer mapping. Furthermore, we propose a cost attention mechanism to learn the layer weights used in EMD automatically, which is supposed to further improve the model’s performance and accelerate convergence time. Extensive experiments on GLUE benchmark demonstrate that our model achieves competitive performance compared to strong competitors in terms of both accuracy and model compression
Search
Co-authors
- Dayong Wu 1
- Jiaqi Li 1
- Baoxin Wang 1
- Siyuan Xue 1
- Yanjie Yang 1
- show all...