Keyu Chen
2024
F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods
Yu Sun
|
Keyu Chen
|
Shujie Wang
|
Peiji Li
|
Qipeng Guo
|
Hang Yan
|
Xipeng Qiu
|
Xuanjing Huang
|
Dahua Lin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) garner significant attention for their unprecedented performance, leading to an increasing number of researches evaluating LLMs. However, these evaluation benchmarks are limited to assessing the instruction-following capabilities, overlooking the fundamental abilities that emerge during the pre-training stage. Previous subjective evaluation methods mainly reply on scoring by API models. However, in the absence of references, large models have shown limited ability to discern subtle differences. To bridge the gap, we propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic. The tasks in F-Eval include multi-choice objective tasks, open-ended objective tasks, reference-based subjective tasks and reference-free subjective tasks. For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models. We conduct evaluations on 13 advanced LLMs. Results show that our evaluation methods show higher correlation coefficients and larger distinction than other evaluators. Additionally, we discuss the influence of different model sizes, dimensions, and normalization methods. We anticipate that F-Eval will facilitate the study of LLMs’ fundamental abilities.
2023
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Kai Lv
|
Shuo Zhang
|
Tianle Gu
|
Shuhao Xing
|
Jiawei Hong
|
Keyu Chen
|
Xiaoran Liu
|
Yuqing Yang
|
Honglin Guo
|
Tengxiao Liu
|
Yu Sun
|
Qipeng Guo
|
Hang Yan
|
Xipeng Qiu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Large language models (LLMs) are increasingly pivotal in a wide range of natural language processing tasks. Access to pre-trained models, courtesy of the open-source community, has made it possible to adapt these models to specific applications for enhanced performance. However, the substantial resources required for training these models necessitate efficient solutions. This paper introduces CoLLiE, an efficient library that facilitates collaborative training of large language models using 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion, Adan, Sophia, and LOMO. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization. CoLLiE has proven superior training efficiency in comparison with prevalent solutions in pre-training and fine-tuning scenarios. Furthermore, we provide an empirical evaluation of the correlation between model size and GPU memory consumption under different optimization methods, as well as an analysis of the throughput. Lastly, we carry out a comprehensive comparison of various optimizers and PEFT methods within the instruction-tuning context. CoLLiE is available at https://github.com/OpenLMLab/collie.
Search
Fix author
Co-authors
- Qipeng Guo 2
- Xipeng Qiu (邱锡鹏) 2
- Yu Sun 2
- Hang Yan (航 颜) 2
- Tianle Gu 1
- show all...