Jessica Zosa Forde


2025

pdf bib
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 regions around the world and 16 languages, spanning multiple identity categories subject to discrimination worldwide. We demonstrate its utility in a series of exploratory evaluations for both “base” and “instruction-tuned” language models. Our results suggest that stereotypes are consistently reflected across models and languages, with some languages and models indicating much stronger stereotype biases than others.

2024

pdf bib
Re-Evaluating Evaluation for Multilingual Summarization
Jessica Zosa Forde | Ruochen Zhang | Lintang Sutawika | Alham Fikri Aji | Samuel Cahyawijaya | Genta Indra Winata | Minghao Wu | Carsten Eickhoff | Stella Biderman | Ellie Pavlick
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Automatic evaluation approaches (ROUGE, BERTScore, LLM-based evaluators) have been widely used to evaluate summarization tasks. Despite the complexities of script differences and tokenization, these approaches have been indiscriminately applied to summarization across multiple languages. While previous works have argued that these approaches correlate strongly with human ratings in English, it remains unclear whether the conclusion holds for other languages. To answer this question, we construct a small-scale pilot dataset containing article-summary pairs and human ratings in English, Chinese and Indonesian. To measure the strength of summaries, our ratings are measured as head-to-head comparisons with resulting Elo scores across four dimensions. Our analysis reveals that standard metrics are unreliable measures of quality, and that these problems are exacerbated in Chinese and Indonesian. We advocate for more nuanced and careful considerations in designing a robust evaluation framework for multiple languages.

2023

pdf bib
Current Status of NLP in South East Asia with Insights from Multilingualism and Language Diversity
Alham Fikri Aji | Jessica Zosa Forde | Alyssa Marie Loo | Lintang Sutawika | Skyler Wang | Genta Indra Winata | Zheng-Xin Yong | Ruochen Zhang | A. Seza Doğruöz | Yin Lin Tan | Jan Christian Blaise Cruz
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract

pdf bib
Efficient Methods for Natural Language Processing: A Survey
Marcos Treviso | Ji-Ung Lee | Tianchu Ji | Betty van Aken | Qingqing Cao | Manuel R. Ciosici | Michael Hassid | Kenneth Heafield | Sara Hooker | Colin Raffel | Pedro H. Martins | André F. T. Martins | Jessica Zosa Forde | Peter Milder | Edwin Simpson | Noam Slonim | Jesse Dodge | Emma Strubell | Niranjan Balasubramanian | Leon Derczynski | Iryna Gurevych | Roy Schwartz
Transactions of the Association for Computational Linguistics, Volume 11

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
Search Fix data