Victor Sanh


pdf bib
Low-Complexity Probing via Finding Subnetworks
Steven Cao | Victor Sanh | Alexander Rush
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model’s internal representations. This approach can detect properties encoded in the model, but at the cost of adding new parameters that may learn the task directly. We instead propose a subtractive pruning-based probe, where we find an existing subnetwork that performs the linguistic task of interest. Compared to an MLP, the subnetwork probe achieves both higher accuracy on pre-trained models and lower accuracy on random models, so it is both better at finding properties of interest and worse at learning on its own. Next, by varying the complexity of each probe, we show that subnetwork probing Pareto-dominates MLP probing in that it achieves higher accuracy given any budget of probe complexity. Finally, we analyze the resulting subnetworks across various tasks to locate where each task is encoded, and we find that lower-level tasks are captured in lower layers, reproducing similar findings in past work.


pdf bib
Transformers: State-of-the-Art Natural Language Processing
Thomas Wolf | Lysandre Debut | Victor Sanh | Julien Chaumond | Clement Delangue | Anthony Moi | Pierric Cistac | Tim Rault | Remi Louf | Morgan Funtowicz | Joe Davison | Sam Shleifer | Patrick von Platen | Clara Ma | Yacine Jernite | Julien Plu | Canwen Xu | Teven Le Scao | Sylvain Gugger | Mariama Drame | Quentin Lhoest | Alexander Rush
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at