Alan Ansell


pdf bib
MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer
Alan Ansell | Edoardo Maria Ponti | Jonas Pfeiffer | Sebastian Ruder | Goran Glavaš | Ivan Vulić | Anna Korhonen
Findings of the Association for Computational Linguistics: EMNLP 2021

Adapter modules have emerged as a general parameter-efficient means to specialize a pretrained encoder to new domains. Massively multilingual transformers (MMTs) have particularly benefited from additional training of language-specific adapters. However, this approach is not viable for the vast majority of languages, due to limitations in their corpus size or compute budgets. In this work, we propose MAD-G (Multilingual ADapter Generation), which contextually generates language adapters from language representations based on typological features. In contrast to prior work, our time- and space-efficient MAD-G approach enables (1) sharing of linguistic knowledge across languages and (2) zero-shot inference by generating language adapters for unseen languages. We thoroughly evaluate MAD-G in zero-shot cross-lingual transfer on part-of-speech tagging, dependency parsing, and named entity recognition. While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board. Moreover, it offers substantial benefits for low-resource languages, particularly on the NER task in low-resource African languages. Finally, we demonstrate that MAD-G’s transfer performance can be further improved via: (i) multi-source training, i.e., by generating and combining adapters of multiple languages with available task-specific training data; and (ii) by further fine-tuning generated MAD-G adapters for languages with monolingual data.

pdf bib
PolyLM: Learning about Polysemy through Language Modeling
Alan Ansell | Felipe Bravo-Marquez | Bernhard Pfahringer
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

To avoid the “meaning conflation deficiency” of word embeddings, a number of models have aimed to embed individual word senses. These methods at one time performed well on tasks such as word sense induction (WSI), but they have since been overtaken by task-specific techniques which exploit contextualized embeddings. However, sense embeddings and contextualization need not be mutually exclusive. We introduce PolyLM, a method which formulates the task of learning sense embeddings as a language modeling problem, allowing contextualization techniques to be applied. PolyLM is based on two underlying assumptions about word senses: firstly, that the probability of a word occurring in a given context is equal to the sum of the probabilities of its individual senses occurring; and secondly, that for a given occurrence of a word, one of its senses tends to be much more plausible in the context than the others. We evaluate PolyLM on WSI, showing that it performs considerably better than previous sense embedding techniques, and matches the current state-of-the-art specialized WSI method despite having six times fewer parameters. Code and pre-trained models are available at


pdf bib
An ELMo-inspired approach to SemDeep-5’s Word-in-Context task
Alan Ansell | Felipe Bravo-Marquez | Bernhard Pfahringer
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)