Pragmatic Issue-Sensitive Image Captioning
Allen Nie | Reuben Cohn-Gordon | Christopher Potts
Findings of the Association for Computational Linguistics: EMNLP 2020
Image captioning systems need to produce texts that are not only true but also relevant in that they are properly aligned with the current issues. For instance, in a newspaper article about a sports event, a caption that not only identifies the player in a picture but also comments on their ethnicity could create unwanted reader reactions. To address this, we propose Issue-Sensitive Image Captioning (ISIC). In ISIC, the captioner is given a target image and an issue, which is a set of images partitioned in a way that specifies what information is relevant. For the sports article, we could construct a partition that places images into equivalence classes based on player position. To model this task, we use an extension of the Rational Speech Acts model. Our extension is built on top of state-of-the-art pretrained neural image captioners and explicitly uses image partitions to control caption generation. In both automatic and human evaluations, we show that these models generate captions that are descriptive and issue-sensitive. Finally, we show how ISIC can complement and enrich the related task of Visual Question Answering.
Inducing Grammar from Long Short-Term Memory Networks by Shapley Decomposition
Yuhui Zhang | Allen Nie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
The principle of compositionality has deep roots in linguistics: the meaning of an expression is determined by its structure and the meanings of its constituents. However, modern neural network models such as long short-term memory network process expressions in a linear fashion and do not seem to incorporate more complex compositional patterns. In this work, we show that we can explicitly induce grammar by tracing the computational process of a long short-term memory network. We show: (i) the multiplicative nature of long short-term memory network allows complex interaction beyond sequential linear combination; (ii) we can generate compositional trees from the network without external linguistic knowledge; (iii) we evaluate the syntactic difference between the generated trees, randomly generated trees and gold reference trees produced by constituency parsers; (iv) we evaluate whether the generated trees contain the rich semantic information.
Learning to Explain: Answering Why-Questions via Rephrasing
Allen Nie | Erin Bennett | Noah Goodman
Proceedings of the First Workshop on NLP for Conversational AI
Providing plausible responses to why questions is a challenging but critical goal for language based human-machine interaction. Explanations are challenging in that they require many different forms of abstract knowledge and reasoning. Previous work has either relied on human-curated structured knowledge bases or detailed domain representation to generate satisfactory explanations. They are also often limited to ranking pre-existing explanation choices. In our work, we contribute to the under-explored area of generating natural language explanations for general phenomena. We automatically collect large datasets of explanation-phenomenon pairs which allow us to train sequence-to-sequence models to generate natural language explanations. We compare different training strategies and evaluate their performance using both automatic scores and human ratings. We demonstrate that our strategy is sufficient to generate highly plausible explanations for general open-domain phenomena compared to other models trained on different datasets.
DisSent: Learning Sentence Representations from Explicit Discourse Relations
Allen Nie | Erin Bennett | Noah Goodman
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that our curated dataset provides an excellent signal for learning vector representations of sentence meaning, representing relations that can only be determined when the meanings of two sentences are combined. We demonstrate that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. Our fixed sentence embeddings achieve high performance on a variety of transfer tasks, including SentEval, and we achieve state-of-the-art results on Penn Discourse Treebank’s implicit relation prediction task.