Nikolay Malkin


pdf bib
Coherence boosting: When your pretrained language model is not paying enough attention
Nikolay Malkin | Zhen Wang | Nebojsa Jojic
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Long-range semantic coherence remains a challenge in automatic language generation and understanding. We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction. We present coherence boosting, an inference procedure that increases a LM’s focus on a long context. We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses. It is also found that coherence boosting with state-of-the-art models for various zero-shot NLP tasks yields performance gains with no additional training.


pdf bib
GPT Perdetry Test: Generating new meanings for new words
Nikolay Malkin | Sameera Lanka | Pranav Goel | Sudha Rao | Nebojsa Jojic
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Human innovation in language, such as inventing new words, is a challenge for pretrained language models. We assess the ability of one large model, GPT-3, to process new words and decide on their meaning. We create a set of nonce words and prompt GPT-3 to generate their dictionary definitions. We find GPT-3 produces plausible definitions that align with human judgments. Moreover, GPT-3’s definitions are sometimes preferred to those invented by humans, signaling its intriguing ability not just to adapt, but to add to the evolving vocabulary of the English language.

pdf bib
Studying word order through iterative shuffling
Nikolay Malkin | Sameera Lanka | Pranav Goel | Nebojsa Jojic
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

As neural language models approach human performance on NLP benchmark tasks, their advances are widely seen as evidence of an increasingly complex understanding of syntax. This view rests upon a hypothesis that has not yet been empirically tested: that word order encodes meaning essential to performing these tasks. We refute this hypothesis in many cases: in the GLUE suite and in various genres of English text, the words in a sentence or phrase can rarely be permuted to form a phrase carrying substantially different information. Our surprising result relies on inference by iterative shuffling (IBIS), a novel, efficient procedure that finds the ordering of a bag of words having the highest likelihood under a fixed language model. IBIS can use any black-box model without additional training and is superior to existing word ordering algorithms. Coalescing our findings, we discuss how shuffling inference procedures such as IBIS can benefit language modeling and constrained generation.