AbstractTransformer-based language models (LMs) pretrained on large text collections implicitly store a wealth of lexical semantic knowledge, but it is non-trivial to extract that knowledge effectively from their parameters. Inspired by prior work on semantic specialization of static word embedding (WE) models, we show that it is possible to expose and enrich lexical knowledge from the LMs, that is, to specialize them to serve as effective and universal “decontextualized” word encoders even when fed input words “in isolation” (i.e., without any context). Their transformation into such word encoders is achieved through a simple and efficient lexical fine-tuning procedure (termed LexFit) based on dual-encoder network structures. Further, we show that LexFit can yield effective word encoders even with limited lexical supervision and, via cross-lingual transfer, in different languages without any readily available external knowledge. Our evaluation over four established, structurally different lexical-level tasks in 8 languages indicates the superiority of LexFit-based WEs over standard static WEs (e.g., fastText) and WEs from vanilla LMs. Other extensive experiments and ablation studies further profile the LexFit framework, and indicate best practices and performance variations across LexFit variants, languages, and lexical tasks, also directly questioning the usefulness of traditional WE models in the era of large neural models.