Lambert Mathias


pdf bib
Prompt-free and Efficient Few-shot Learning with Language Models
Rabeeh Karimi Mahabadi | Luke Zettlemoyer | James Henderson | Lambert Mathias | Marzieh Saeidi | Veselin Stoyanov | Majid Yazdani
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose Perfect, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. Perfect makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn new multi-token label embeddings during fine-tuning, which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few shot NLP tasks demonstrate that Perfect, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. Our code is publicly available at

pdf bib
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning
Yuning Mao | Lambert Mathias | Rui Hou | Amjad Almahairi | Hao Ma | Jiawei Han | Scott Yih | Madian Khabsa
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited. However, different PELT methods may perform rather differently on the same task, making it nontrivial to select the most appropriate method for a specific task, especially considering the fast-growing number of new PELT methods and tasks. In light of model diversity and the difficulty of model selection, we propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup via gating mechanism. On the GLUE benchmark, UniPELT consistently achieves 1 4% gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups. Moreover, UniPELT generally surpasses the upper bound that takes the best performance of all its submodules used individually on each task, indicating that a mixture of multiple PELT methods may be inherently more effective than single methods.

pdf bib
UNIREX: A Unified Learning Framework for Language Model Rationale Extraction
Aaron Chan | Maziar Sanjabi | Lambert Mathias | Liang Tan | Shaoliang Nie | Xiaochang Peng | Xiang Ren | Hamed Firooz
Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models

An extractive rationale explains a language model’s (LM’s) prediction on a given task instance by highlighting the text inputs that most influenced the prediction. Ideally, rationale extraction should be faithful (reflective of LM’s actual behavior) and plausible (convincing to humans), without compromising the LM’s (i.e., task model’s) task performance. Although attribution algorithms and select-predict pipelines are commonly used in rationale extraction, they both rely on certain heuristics that hinder them from satisfying all three desiderata. In light of this, we propose UNIREX, a flexible learning framework which generalizes rationale extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (2) select explainability objectives (i.e., faithfulness and plausibility criteria); and (3) jointly the train task model and rationale extractor on the task using selected objectives. UNIREX enables replacing prior works’ heuristic design choices with a generic learned rationale extractor in (1) and optimizing it for all three desiderata in (2)-(3). To facilitate comparison between methods w.r.t. multiple desiderata, we introduce the Normalized Relative Gain (NRG) metric. Across five English text classification datasets, our best UNIREX configuration outperforms the strongest baselines by an average of 32.9% NRG. Plus, we find that UNIREX-trained rationale extractors’ faithfulness can even generalize to unseen datasets and tasks.


pdf bib
Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection
Lambert Mathias | Shaoliang Nie | Aida Mostafazadeh Davani | Douwe Kiela | Vinodkumar Prabhakaran | Bertie Vidgen | Zeerak Waseem
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

We present the results and main findings of the shared task at WOAH 5 on hateful memes detection. The task include two subtasks relating to distinct challenges in the fine-grained detection of hateful memes: (1) the protected category attacked by the meme and (2) the attack type. 3 teams submitted system description papers. This shared task builds on the hateful memes detection task created by Facebook AI Research in 2020.


pdf bib
Improving Long Distance Slot Carryover in Spoken Dialogue Systems
Tongfei Chen | Chetan Naik | Hua He | Pushpendre Rastogi | Lambert Mathias
Proceedings of the First Workshop on NLP for Conversational AI

Tracking the state of the conversation is a central component in task-oriented spoken dialogue systems. One such approach for tracking the dialogue state is slot carryover, where a model makes a binary decision if a slot from the context is relevant to the current turn. Previous work on the slot carryover task used models that made independent decisions for each slot. A close analysis of the results show that this approach results in poor performance over longer context dialogues. In this paper, we propose to jointly model the slots. We propose two neural network architectures, one based on pointer networks that incorporate slot ordering information, and the other based on transformer networks that uses self attention mechanism to model the slot interdependencies. Our experiments on an internal dialogue benchmark dataset and on the public DSTC2 dataset demonstrate that our proposed models are able to resolve longer distance slot references and are able to achieve competitive performance.


pdf bib
The Alexa Meaning Representation Language
Thomas Kollar | Danielle Berry | Lauren Stuart | Karolina Owczarzak | Tagyoung Chung | Lambert Mathias | Michael Kayser | Bradford Snow | Spyros Matsoukas
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

This paper introduces a meaning representation for spoken language understanding. The Alexa meaning representation language (AMRL), unlike previous approaches, which factor spoken utterances into domains, provides a common representation for how people communicate in spoken language. AMRL is a rooted graph, links to a large-scale ontology, supports cross-domain queries, fine-grained types, complex utterances and composition. A spoken language dataset has been collected for Alexa, which contains ∼20k examples across eight domains. A version of this meaning representation was released to developers at a trade show in 2016.


pdf bib
Transfer Learning for Neural Semantic Parsing
Xing Fan | Emilio Monti | Lambert Mathias | Markus Dreyer
Proceedings of the 2nd Workshop on Representation Learning for NLP

The goal of semantic parsing is to map natural language to a machine interpretable meaning representation language (MRL). One of the constraints that limits full exploration of deep learning technologies for semantic parsing is the lack of sufficient annotation training data. In this paper, we propose using sequence-to-sequence in a multi-task setup for semantic parsing with focus on transfer learning. We explore three multi-task architectures for sequence-to-sequence model and compare their performance with the independently trained model. Our experiments show that the multi-task setup aids transfer learning from an auxiliary task with large labeled data to the target task with smaller labeled data. We see an absolute accuracy gain ranging from 1.0% to 4.4% in in our in-house data set and we also see good gains ranging from 2.5% to 7.0% on the ATIS semantic parsing tasks with syntactic and semantic auxiliary tasks.