Moin Nabi


2021

pdf bib
Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement of Language Models
Tassilo Klein | Moin Nabi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Can we get existing language models and refine them for zero-shot commonsense reasoning? This paper presents an initial study exploring the feasibility of zero-shot commonsense reasoning for the Winograd Schema Challenge by formulating the task as self-supervised refinement of a pre-trained language model. In contrast to previous studies that rely on fine-tuning annotated datasets, we seek to boost conceptualization via loss landscape refinement. To this end, we propose a novel self-supervised learning approach that refines the language model utilizing a set of linguistic perturbations of similar concept relationships. Empirical analysis of our conceptually simple framework demonstrates the viability of zero-shot commonsense reasoning on multiple benchmarks.

pdf bib
Attention-based Contrastive Learning for Winograd Schemas
Tassilo Klein | Moin Nabi
Findings of the Association for Computational Linguistics: EMNLP 2021

Self-supervised learning has recently attracted considerable attention in the NLP community for its ability to learn discriminative features using a contrastive objective. This paper investigates whether contrastive learning can be extended to Transfomer attention to tackling the Winograd Schema Challenge. To this end, we propose a novel self-supervised framework, leveraging a contrastive loss directly at the level of self-attention. Experimental analysis of our attention-based models on multiple datasets demonstrates superior commonsense reasoning capabilities. The proposed approach outperforms all comparable unsupervised approaches while occasionally surpassing supervised ones.

pdf bib
EaSe: A Diagnostic Tool for VQA based on Answer Diversity
Shailza Jolly | Sandro Pezzelle | Moin Nabi
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose EASE, a simple diagnostic tool for Visual Question Answering (VQA) which quantifies the difficulty of an image, question sample. EASE is based on the pattern of answers provided by multiple annotators to a given question. In particular, it considers two aspects of the answers: (i) their Entropy; (ii) their Semantic content. First, we prove the validity of our diagnostic to identify samples that are easy/hard for state-of-art VQA models. Second, we show that EASE can be successfully used to select the most-informative samples for training/fine-tuning. Crucially, only information that is readily available in any VQA dataset is used to compute its scores.

2020

pdf bib
Contrastive Self-Supervised Learning for Commonsense Reasoning
Tassilo Klein | Moin Nabi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a self-supervised method to solve Pronoun Disambiguation and Winograd Schema Challenge problems. Our approach exploits the characteristic structure of training corpora related to so-called “trigger” words, which are responsible for flipping the answer in pronoun disambiguation. We achieve such commonsense reasoning by constructing pair-wise contrastive auxiliary predictions. To this end, we leverage a mutual exclusive loss regularized by a contrastive margin. Our architecture is based on the recently introduced transformer networks, BERT, that exhibits strong performance on many NLP benchmarks. Empirical results show that our method alleviates the limitation of current supervised approaches for commonsense reasoning. This study opens up avenues for exploiting inexpensive self-supervision to achieve performance gain in commonsense reasoning tasks.

2019

pdf bib
Attention Is (not) All You Need for Commonsense Reasoning
Tassilo Klein | Moin Nabi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.

pdf bib
Proceedings of the Second Workshop on Shortcomings in Vision and Language
Raffaella Bernardi | Raquel Fernandez | Spandana Gella | Kushal Kafle | Christopher Kanan | Stefan Lee | Moin Nabi
Proceedings of the Second Workshop on Shortcomings in Vision and Language

2017

pdf bib
FOIL it! Find One mismatch between Image and Language caption
Ravi Shekhar | Sandro Pezzelle | Yauhen Klimovich | Aurélie Herbelot | Moin Nabi | Enver Sangineto | Raffaella Bernardi
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities. To this end, we propose an extension of the MS-COCO dataset, FOIL-COCO, which associates images with both correct and ‘foil’ captions, that is, descriptions of the image that are highly similar to the original ones, but contain one single mistake (‘foil word’). We show that current LaVi models fall into the traps of this data and perform badly on three tasks: a) caption classification (correct vs. foil); b) foil word detection; c) foil word correction. Humans, in contrast, have near-perfect performance on those tasks. We demonstrate that merely utilising language cues is not enough to model FOIL-COCO and that it challenges the state-of-the-art by requiring a fine-grained understanding of the relation between text and image.

pdf bib
Self-Crowdsourcing Training for Relation Extraction
Azad Abad | Moin Nabi | Alessandro Moschitti
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper we introduce a self-training strategy for crowdsourcing. The training examples are automatically selected to train the crowd workers. Our experimental results show an impact of 5% Improvement in terms of F1 for relation extraction task, compared to the method based on distant supervision.

pdf bib
Vision and Language Integration: Moving beyond Objects
Ravi Shekhar | Sandro Pezzelle | Aurélie Herbelot | Moin Nabi | Enver Sangineto | Raffaella Bernardi
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers