Animashree Anandkumar

Also published as: Anima Anandkumar


pdf bib
MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models
Peng Xu | Mostofa Patwary | Mohammad Shoeybi | Raul Puri | Pascale Fung | Anima Anandkumar | Bryan Catanzaro
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).


pdf bib
Multi Sense Embeddings from Topic Models
Shobhit Jain | Sravan Babu Bodapati | Ramesh Nallapati | Anima Anandkumar
Proceedings of the 3rd International Conference on Natural Language and Speech Processing


pdf bib
Probabilistic FastText for Multi-Sense Word Embeddings
Ben Athiwaratkun | Andrew Wilson | Anima Anandkumar
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the mean of a mixture component is given by the sum of n-grams. This representation allows the model to share the “strength” across sub-word structures (e.g. Latin roots), producing accurate representations of rare, misspelt, or even unseen words. Moreover, each component of the mixture can capture a different word sense. Probabilistic FastText outperforms both FastText, which has no probabilistic model, and dictionary-level probabilistic embeddings, which do not incorporate subword structures, on several word-similarity benchmarks, including English RareWord and foreign language datasets. We also achieve state-of-art performance on benchmarks that measure ability to discern different meanings. Thus, our model is the first to achieve best of both the worlds: multi-sense representations while having enriched semantics on rare words.


pdf bib
Deep Active Learning for Named Entity Recognition
Yanyao Shen | Hyokun Yun | Zachary Lipton | Yakov Kronrod | Animashree Anandkumar
Proceedings of the 2nd Workshop on Representation Learning for NLP

Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show otherwise: by combining deep learning with active learning, we can outperform classical methods even with a significantly smaller amount of training data.