Austin Myers


pdf bib
Distribution Aware Metrics for Conditional Natural Language Generation
David M. Chan | Yiming Ni | David Ross | Sudheendra Vijayanarasimhan | Austin Myers | John Canny
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Traditional automated metrics for evaluating conditional natural language generation rely on pairwise comparisons between a single generated text and the best-matching gold-standard reference. This method is effective when ground truth data diversity can be attributed to noise, however, it falls short when diversity in references holds valuable contextual information, as in visual description or summarization, as it does not evaluate the ability of a model to generate text matching the diversity of the ground truth samples. In this paper, we challenge the adequacy of existing metrics in such semantically diverse contexts and introduce a novel approach for evaluating conditional language generation models, leveraging a family of meta-metrics that build on existing pairwise distance functions. These meta-metrics assess not just single-samples, but distributions of reference and model-generated captions using small sample sets. We demonstrate our approach through a case study of visual description in the English language which reveals not only how current models prioritize single-description quality over diversity, but further sheds light on the impact of sampling methods and temperature settings on description quality and diversity.


pdf bib
IC3: Image Captioning by Committee Consensus
David Chan | Austin Myers | Sudheendra Vijayanarasimhan | David Ross | John Canny
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single “best’ (most like a reference) image caption. Unfortunately, doing so encourages captions that are “informationally impoverished,’ and focus on only a subset of the possible details, while ignoring other potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: “Image Captioning by Committee Consensus’ (IC3), designed to generate a single caption that captures high-level details from several annotator viewpoints. Humans rate captions produced by IC3 at least as helpful as baseline SOTA models more than two thirds of the time, and IC3 can improve the performance of SOTA automated recall systems by up to 84%, outperforming single human-generated reference captions, and indicating significant improvements over SOTA approaches for visual description. Code is available at [](