Kilian Weinberger


2024

pdf bib
Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction
Boyi Li | Rodolfo Corona | Karttikeya Mangalam | Catherine Chen | Daniel Flaherty | Serge Belongie | Kilian Weinberger | Jitendra Malik | Trevor Darrell | Dan Klein
Findings of the Association for Computational Linguistics: NAACL 2024

Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is a C-PFCG that incorporates embeddings from text-only large language models (LLMs). We use a fixed grammar family to directly compare LC-PCFG to various multimodal grammar induction methods. We compare performance on four benchmark datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1 compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is also more computationally efficient, providing an up to 85% reduction in parameter count and 8.8× reduction in training time compared to multimodal approaches. These results suggest that multimodal inputs may not be necessary for grammar induction, and emphasize the importance of strong vision-free baselines for evaluating the benefit of multimodal approaches.

2022

pdf bib
Long-term Control for Dialogue Generation: Methods and Evaluation
Ramya Ramakrishnan | Hashan Narangodage | Mauro Schilman | Kilian Weinberger | Ryan McDonald
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Current approaches for controlling dialogue response generation are primarily focused on high-level attributes like style, sentiment, or topic. In this work, we focus on constrained long-term dialogue generation, which involves more fine-grained control and requires a given set of control words to appear in generated responses. This setting requires a model to not only consider the generation of these control words in the immediate context, but also produce utterances that will encourage the generation of the words at some time in the (possibly distant) future. We define the problem of constrained long-term control for dialogue generation, identify gaps in current methods for evaluation, and propose new metrics that better measure long-term control. We also propose a retrieval-augmented method that improves performance of long-term controlled generation via logit modification techniques. We show through experiments on three task-oriented dialogue datasets that our metrics better assess dialogue control relative to current alternatives and that our method outperforms state-of-the-art constrained generation baselines.

2018

pdf bib
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
Xilun Chen | Yu Sun | Ben Athiwaratkun | Claire Cardie | Kilian Weinberger
Transactions of the Association for Computational Linguistics, Volume 6

In recent years great success has been achieved in sentiment classification for English, thanks in part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance of labeled data. To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN1) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. ADAN has two discriminative branches: a sentiment classifier and an adversarial language discriminator. Both branches take input from a shared feature extractor to learn hidden representations that are simultaneously indicative for the classification task and invariant across languages. Experiments on Chinese and Arabic sentiment classification demonstrate that ADAN significantly outperforms state-of-the-art systems.