François Lagunas
2021
Block Pruning For Faster Transformers
François Lagunas | Ella Charlaix | Victor Sanh | Alexander Rush
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
François Lagunas | Ella Charlaix | Victor Sanh | Alexander Rush
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models. Pruning methods have proven to be an effective way of reducing model size, whereas distillation methods are proven for speeding up inference. We introduce a block pruning approach targeting both small and fast models. Our approach extends structured methods by considering blocks of any size and integrates this structure into the movement pruning paradigm for fine-tuning. We find that this approach learns to prune out full components of the underlying model, such as attention heads. Experiments consider classification and generation tasks, yielding among other results a pruned model that is a 2.4x faster, 74% smaller BERT on SQuAD v1, with a 1% drop on F1, competitive both with distilled models in speed and pruned models in size.
Datasets: A Community Library for Natural Language Processing
Quentin Lhoest | Albert Villanova del Moral | Yacine Jernite | Abhishek Thakur | Patrick von Platen | Suraj Patil | Julien Chaumond | Mariama Drame | Julien Plu | Lewis Tunstall | Joe Davison | Mario Šaško | Gunjan Chhablani | Bhavitvya Malik | Simon Brandeis | Teven Le Scao | Victor Sanh | Canwen Xu | Nicolas Patry | Angelina McMillan-Major | Philipp Schmid | Sylvain Gugger | Clément Delangue | Théo Matussière | Lysandre Debut | Stas Bekman | Pierric Cistac | Thibault Goehringer | Victor Mustar | François Lagunas | Alexander Rush | Thomas Wolf
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Quentin Lhoest | Albert Villanova del Moral | Yacine Jernite | Abhishek Thakur | Patrick von Platen | Suraj Patil | Julien Chaumond | Mariama Drame | Julien Plu | Lewis Tunstall | Joe Davison | Mario Šaško | Gunjan Chhablani | Bhavitvya Malik | Simon Brandeis | Teven Le Scao | Victor Sanh | Canwen Xu | Nicolas Patry | Angelina McMillan-Major | Philipp Schmid | Sylvain Gugger | Clément Delangue | Théo Matussière | Lysandre Debut | Stas Bekman | Pierric Cistac | Thibault Goehringer | Victor Mustar | François Lagunas | Alexander Rush | Thomas Wolf
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks. The library is available at https://github.com/huggingface/datasets.
Search
Fix author
Co-authors
- Alexander M. Rush 2
- Victor Sanh 2
- Stas Bekman 1
- Simon Brandeis 1
- Ella Charlaix 1
- Julien Chaumond 1
- Gunjan Chhablani 1
- Pierric Cistac 1
- Joe Davison 1
- Lysandre Debut 1
- Clément Delangue 1
- Mariama Drame 1
- Thibault Goehringer 1
- Sylvain Gugger 1
- Yacine Jernite 1
- Teven Le Scao 1
- Quentin Lhoest 1
- Bhavitvya Malik 1
- Théo Matussière 1
- Angelina McMillan-Major 1
- Victor Mustar 1
- Suraj Patil 1
- Nicolas Patry 1
- Julien Plu 1
- Philipp Schmid 1
- Abhishek Thakur 1
- Lewis Tunstall 1
- Albert Villanova del Moral 1
- Thomas Wolf 1
- Canwen Xu 1
- Patrick von Platen 1
- Mario Šaško 1