Scaling Cultural Resources for Improving Generative Models

Hayk Stepanyan; Aishwarya Verma; Andrew Zaldivar; Rutledge Chin Feman; Erin MacMurray van Liemt; Charu Kalia; Vinodkumar Prabhakaran; Sunipa Dev

Scaling Cultural Resources for Improving Generative Models

Hayk Stepanyan, Aishwarya Verma, Andrew Zaldivar, Rutledge Chin Feman, Erin MacMurray van Liemt, Charu Kalia, Vinodkumar Prabhakaran, Sunipa Dev

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been known to be conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.

Anthology ID:: 2026.findings-eacl.352
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6695–6709
Language:
URL:: https://aclanthology.org/2026.findings-eacl.352/
DOI:
Bibkey:
Cite (ACL):: Hayk Stepanyan, Aishwarya Verma, Andrew Zaldivar, Rutledge Chin Feman, Erin MacMurray van Liemt, Charu Kalia, Vinodkumar Prabhakaran, and Sunipa Dev. 2026. Scaling Cultural Resources for Improving Generative Models. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6695–6709, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Scaling Cultural Resources for Improving Generative Models (Stepanyan et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.352.pdf
Checklist:: 2026.findings-eacl.352.checklist.pdf

PDF Cite Search Checklist Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{stepanyan-etal-2026-scaling,
    title = "Scaling Cultural Resources for Improving Generative Models",
    author = "Stepanyan, Hayk  and
      Verma, Aishwarya  and
      Zaldivar, Andrew  and
      Feman, Rutledge Chin  and
      van Liemt, Erin MacMurray  and
      Kalia, Charu  and
      Prabhakaran, Vinodkumar  and
      Dev, Sunipa",
    editor = "Demberg, Vera  and
      Inui, Kentaro  and
      Marquez, Llu{\'i}s",
    booktitle = "Findings of the {A}ssociation for {C}omputational {L}inguistics: {EACL} 2026",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.findings-eacl.352/",
    pages = "6695--6709",
    ISBN = "979-8-89176-386-9",
    abstract = "Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been known to be conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="stepanyan-etal-2026-scaling">
    <titleInfo>
        <title>Scaling Cultural Resources for Improving Generative Models</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Hayk</namePart>
        <namePart type="family">Stepanyan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Aishwarya</namePart>
        <namePart type="family">Verma</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Andrew</namePart>
        <namePart type="family">Zaldivar</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Rutledge</namePart>
        <namePart type="given">Chin</namePart>
        <namePart type="family">Feman</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Erin</namePart>
        <namePart type="given">MacMurray</namePart>
        <namePart type="family">van Liemt</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Charu</namePart>
        <namePart type="family">Kalia</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Vinodkumar</namePart>
        <namePart type="family">Prabhakaran</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sunipa</namePart>
        <namePart type="family">Dev</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2026-03</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Findings of the Association for Computational Linguistics: EACL 2026</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Vera</namePart>
            <namePart type="family">Demberg</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kentaro</namePart>
            <namePart type="family">Inui</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lluís</namePart>
            <namePart type="family">Marquez</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Rabat, Morocco</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-8-89176-386-9</identifier>
    </relatedItem>
    <abstract>Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been known to be conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.</abstract>
    <identifier type="citekey">stepanyan-etal-2026-scaling</identifier>
    <location>
        <url>https://aclanthology.org/2026.findings-eacl.352/</url>
    </location>
    <part>
        <date>2026-03</date>
        <extent unit="page">
            <start>6695</start>
            <end>6709</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Scaling Cultural Resources for Improving Generative Models
%A Stepanyan, Hayk
%A Verma, Aishwarya
%A Zaldivar, Andrew
%A Feman, Rutledge Chin
%A van Liemt, Erin MacMurray
%A Kalia, Charu
%A Prabhakaran, Vinodkumar
%A Dev, Sunipa
%Y Demberg, Vera
%Y Inui, Kentaro
%Y Marquez, Lluís
%S Findings of the Association for Computational Linguistics: EACL 2026
%D 2026
%8 March
%I Association for Computational Linguistics
%C Rabat, Morocco
%@ 979-8-89176-386-9
%F stepanyan-etal-2026-scaling
%X Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been known to be conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.
%U https://aclanthology.org/2026.findings-eacl.352/
%P 6695-6709

Download as File

Markdown (Informal)

[Scaling Cultural Resources for Improving Generative Models](https://aclanthology.org/2026.findings-eacl.352/) (Stepanyan et al., Findings 2026)

Scaling Cultural Resources for Improving Generative Models (Stepanyan et al., Findings 2026)

ACL

Hayk Stepanyan, Aishwarya Verma, Andrew Zaldivar, Rutledge Chin Feman, Erin MacMurray van Liemt, Charu Kalia, Vinodkumar Prabhakaran, and Sunipa Dev. 2026. Scaling Cultural Resources for Improving Generative Models. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6695–6709, Rabat, Morocco. Association for Computational Linguistics.