Analyzing Pre-trained and Fine-tuned Language Models

Marius Mosbach


Abstract
Since the introduction of transformer-based language models in 2018, the current generation of natural language processing (NLP) models continues to demonstrate impressive capabilities on a variety of academic benchmarks and real-world applications. This progress is based on a simple but general pipeline which consists of pre-training neural language models on large quantities of text, followed by an adaptation step that fine-tunes the pre-trained model to perform a specific NLP task of interest. However, despite the impressive progress on academic benchmarks and the widespread deployment of pre-trained and fine-tuned language models in industry we still lack a fundamental understanding of how and why pre-trained and fine-tuned language models work as well as the individual steps of the pipeline that produce them. We makes several contributions towards improving our understanding of pre-trained and fine-tuned language models, ranging from analyzing the linguistic knowledge of pre-trained language models and how it is affected by fine-tuning, to a rigorous analysis of the fine-tuning process itself and how the choice of adaptation technique affects the generalization of models and thereby provide new insights about previously unexplained phenomena and the capabilities of pre-trained and fine-tuned language models.
Anthology ID:
2023.bigpicture-1.10
Volume:
Proceedings of the Big Picture Workshop
Month:
December
Year:
2023
Address:
Singapore
Editors:
Yanai Elazar, Allyson Ettinger, Nora Kassner, Sebastian Ruder, Noah A. Smith
Venue:
BigPicture
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
123–134
Language:
URL:
https://aclanthology.org/2023.bigpicture-1.10
DOI:
10.18653/v1/2023.bigpicture-1.10
Bibkey:
Cite (ACL):
Marius Mosbach. 2023. Analyzing Pre-trained and Fine-tuned Language Models. In Proceedings of the Big Picture Workshop, pages 123–134, Singapore. Association for Computational Linguistics.
Cite (Informal):
Analyzing Pre-trained and Fine-tuned Language Models (Mosbach, BigPicture 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bigpicture-1.10.pdf