HyperMixer: An MLP-based Low Cost Alternative to Transformers

Florian Mai; Arnaud Pannatier; Fabio Fehr; Haolin Chen; Francois Marelli; François Fleuret; James Henderson

doi:10.18653/v1/2023.acl-long.871

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson

Abstract

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

Anthology ID:: 2023.acl-long.871
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15632–15654
Language:
URL:: https://aclanthology.org/2023.acl-long.871
DOI:: 10.18653/v1/2023.acl-long.871
Bibkey:
Cite (ACL):: Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, and James Henderson. 2023. HyperMixer: An MLP-based Low Cost Alternative to Transformers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15632–15654, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: HyperMixer: An MLP-based Low Cost Alternative to Transformers (Mai et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.871.pdf
Video:: https://aclanthology.org/2023.acl-long.871.mp4

PDF Cite Search Video