Constantin Eichenberg
2024
Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization
Björn Deiseroth
|
Max Meuer
|
Nikolas Gritsch
|
Constantin Eichenberg
|
Patrick Schramowski
|
Matthias Aßenmacher
|
Kristian Kersting
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduces the Divergent Token Metrics (DTMs), a novel approach to assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy measures that fail to accurately reflect text generation quality. DTMs measure token divergences that allow deeper insights into the subtleties of model compression, in particular, when evaluating components’ impacts individually. Utilizing the First Divergent Token Metric (FDTM) in model sparsification reveals that 25% of all attention components can be pruned beyond 90% on the Llama-2 model family, still keeping SOTA performance. For quantization, FDTM suggests that more than 80% of parameters can be naively transformed to int8 without special outlier management. These evaluations indicate the necessity of choosing appropriate compressions for parameters individually—and that FDTM can identify those—while standard metrics result in deteriorated outcomes.
2022
MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning
Constantin Eichenberg
|
Sidney Black
|
Samuel Weinbach
|
Letitia Parcalabescu
|
Anette Frank
Findings of the Association for Computational Linguistics: EMNLP 2022
Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pretraining on 0.2 % of the number of samples used to train SimVLM.
Search
Fix data
Co-authors
- Matthias Aßenmacher 1
- Sidney Black 1
- Björn Deiseroth 1
- Anette Frank 1
- Nikolas Gritsch 1
- show all...