Xiaoqing Tan
2024
Altogether: Image Captioning via Re-aligning Alt-text
Hu Xu
|
Po-Yao Huang
|
Xiaoqing Tan
|
Ching-Feng Yeh
|
Jacob Kahn
|
Christine Jou
|
Gargi Ghosh
|
Omer Levy
|
Luke Zettlemoyer
|
Wen-tau Yih
|
Shang-Wen Li
|
Saining Xie
|
Christoph Feichtenhofer
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
This paper focuses on creating synthetic data to improve the quality of image captions. Existing works typically have two shortcomings. First, they caption images from scratch, ignoring existing alt-text metadata, and second, lack transparency if the captioners’ training data (e.g. GPT) is unknown. In this paper, we study a principled approach Altogether based on the key idea to edit and re-align existing alt-texts associated with the images. To generate training data, we perform human annotation where annotators start with the existing alt-text and re-align it to the image content in multiple rounds, consequently constructing captions with rich visual concepts. This differs from prior work that carries out human annotation as a one-time description task solely based on images and annotator knowledge. We train a captioner on this data that generalizes the process of re-aligning alt-texts at scale. Our results show our Altogether approach leads to richer image captions that also improve text-to-image generation and zero-shot image classification tasks.
2023
ROBBIE: Robust Bias Evaluation of Large Generative Language Models
David Esiobu
|
Xiaoqing Tan
|
Saghar Hosseini
|
Megan Ung
|
Yuchen Zhang
|
Jude Fernandes
|
Jane Dwivedi-Yu
|
Eleonora Presani
|
Adina Williams
|
Eric Smith
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensure equal and equitable treatment of marginalized demographic groups. In this work, our focus is two-fold: (1) Benchmarking: a comparison of 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs. Out of those 6 metrics, AdvPromptSet and HolisticBiasR are novel datasets proposed in the paper. The comparison of those benchmarks gives us insights about the bias and toxicity of the compared models. Therefore, we explore the frequency of demographic terms in common LLM pre-training corpora and how this may relate to model biases. (2) Mitigation: we conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements. ROBBIE aims to provide insights for practitioners while deploying a model, emphasizing the need to not only measure potential harms, but also understand how they arise by characterizing the data, mitigate harms once found, and balance any trade-offs. We open-source our analysis code in hopes of encouraging broader measurements of bias in future LLMs.
Search
Co-authors
- Hu Xu 1
- Po-Yao Huang 1
- Ching-Feng Yeh 1
- Jacob Kahn 1
- Christine Jou 1
- show all...