ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Zhipeng Chen, Jingyuan Wang, Xin Zhao, Ji-Rong Wen


Abstract
Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.
Anthology ID:
2022.naacl-main.258
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3519–3539
Language:
URL:
https://aclanthology.org/2022.naacl-main.258
DOI:
10.18653/v1/2022.naacl-main.258
Bibkey:
Cite (ACL):
Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Zhipeng Chen, Jingyuan Wang, Xin Zhao, and Ji-Rong Wen. 2022. ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3519–3539, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models (Li et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.258.pdf
Video:
 https://aclanthology.org/2022.naacl-main.258.mp4
Code
 rucaibox/eliteplm
Data
CommonsenseQAGLUEHellaSwagLAMAQNLIRACESWAGSuperGLUEWritingPrompts