DEMix Layers: Disentangling Domains for Modular Language Modeling

Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer


Abstract
We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer includes a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added, or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce test-time perplexity (especially for out-of-domain data), increase training efficiency, and enable rapid adaptation. Mixing experts during inference, using a parameter-free weighted ensemble, enables better generalization to heterogeneous or unseen domains. We also show it is possible to add experts to adapt to new domains without forgetting older ones, and remove experts to restrict access to unwanted domains. Overall, these results demonstrate benefits of domain modularity in language models.
Anthology ID:
2022.naacl-main.407
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5557–5576
Language:
URL:
https://aclanthology.org/2022.naacl-main.407
DOI:
10.18653/v1/2022.naacl-main.407
Bibkey:
Cite (ACL):
Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, and Luke Zettlemoyer. 2022. DEMix Layers: Disentangling Domains for Modular Language Modeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5557–5576, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
DEMix Layers: Disentangling Domains for Modular Language Modeling (Gururangan et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.407.pdf
Code
 kernelmachine/demix +  additional community code
Data
CORD-19S2ORC