Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models

Yangge Qian; Yilong Hu; Siqi Zhang; Xu Gu; Xiaolin Qin

doi:10.18653/v1/2025.gebnlp-1.33

Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models

Yangge Qian, Yilong Hu, Siqi Zhang, Xu Gu, Xiaolin Qin

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Natural language processing (NLP) systems often inadvertently encode and amplify social biases through entangled representations of demographic attributes and task-related attributes. To mitigate this, we propose a novel framework that combines causal analysis with practical intervention strategies. The method leverages attribute-specific prompting to isolate sensitive attributes while applying information-theoretic constraints to minimize spurious correlations. Experiments across six language models and two classification tasks demonstrate its effectiveness. We hope this work will provide the NLP community with a causal disentanglement perspective for achieving fairness in NLP systems.

Anthology ID:: 2025.gebnlp-1.33
Volume:: Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza
Venues:: GeBNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 393–402
Language:
URL:: https://aclanthology.org/2025.gebnlp-1.33/
DOI:: 10.18653/v1/2025.gebnlp-1.33
Bibkey:
Cite (ACL):: Yangge Qian, Yilong Hu, Siqi Zhang, Xu Gu, and Xiaolin Qin. 2025. Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 393–402, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models (Qian et al., GeBNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.gebnlp-1.33.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{qian-etal-2025-disentangling,
    title = "Disentangling Biased Representations: A Causal Intervention Framework for Fairer {NLP} Models",
    author = "Qian, Yangge  and
      Hu, Yilong  and
      Zhang, Siqi  and
      Gu, Xu  and
      Qin, Xiaolin",
    editor = "Fale{\'n}ska, Agnieszka  and
      Basta, Christine  and
      Costa-juss{\`a}, Marta  and
      Sta{\'n}czak, Karolina  and
      Nozza, Debora",
    booktitle = "Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)",
    month = aug,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.gebnlp-1.33/",
    doi = "10.18653/v1/2025.gebnlp-1.33",
    pages = "393--402",
    ISBN = "979-8-89176-277-0",
    abstract = "Natural language processing (NLP) systems often inadvertently encode and amplify social biases through entangled representations of demographic attributes and task-related attributes. To mitigate this, we propose a novel framework that combines causal analysis with practical intervention strategies. The method leverages attribute-specific prompting to isolate sensitive attributes while applying information-theoretic constraints to minimize spurious correlations. Experiments across six language models and two classification tasks demonstrate its effectiveness. We hope this work will provide the NLP community with a causal disentanglement perspective for achieving fairness in NLP systems."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="qian-etal-2025-disentangling">
    <titleInfo>
        <title>Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Yangge</namePart>
        <namePart type="family">Qian</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yilong</namePart>
        <namePart type="family">Hu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Siqi</namePart>
        <namePart type="family">Zhang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xu</namePart>
        <namePart type="family">Gu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xiaolin</namePart>
        <namePart type="family">Qin</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Agnieszka</namePart>
            <namePart type="family">Faleńska</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christine</namePart>
            <namePart type="family">Basta</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marta</namePart>
            <namePart type="family">Costa-jussà</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Karolina</namePart>
            <namePart type="family">Stańczak</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Debora</namePart>
            <namePart type="family">Nozza</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Vienna, Austria</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-8-89176-277-0</identifier>
    </relatedItem>
    <abstract>Natural language processing (NLP) systems often inadvertently encode and amplify social biases through entangled representations of demographic attributes and task-related attributes. To mitigate this, we propose a novel framework that combines causal analysis with practical intervention strategies. The method leverages attribute-specific prompting to isolate sensitive attributes while applying information-theoretic constraints to minimize spurious correlations. Experiments across six language models and two classification tasks demonstrate its effectiveness. We hope this work will provide the NLP community with a causal disentanglement perspective for achieving fairness in NLP systems.</abstract>
    <identifier type="citekey">qian-etal-2025-disentangling</identifier>
    <identifier type="doi">10.18653/v1/2025.gebnlp-1.33</identifier>
    <location>
        <url>https://aclanthology.org/2025.gebnlp-1.33/</url>
    </location>
    <part>
        <date>2025-08</date>
        <extent unit="page">
            <start>393</start>
            <end>402</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models
%A Qian, Yangge
%A Hu, Yilong
%A Zhang, Siqi
%A Gu, Xu
%A Qin, Xiaolin
%Y Faleńska, Agnieszka
%Y Basta, Christine
%Y Costa-jussà, Marta
%Y Stańczak, Karolina
%Y Nozza, Debora
%S Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
%D 2025
%8 August
%I Association for Computational Linguistics
%C Vienna, Austria
%@ 979-8-89176-277-0
%F qian-etal-2025-disentangling
%X Natural language processing (NLP) systems often inadvertently encode and amplify social biases through entangled representations of demographic attributes and task-related attributes. To mitigate this, we propose a novel framework that combines causal analysis with practical intervention strategies. The method leverages attribute-specific prompting to isolate sensitive attributes while applying information-theoretic constraints to minimize spurious correlations. Experiments across six language models and two classification tasks demonstrate its effectiveness. We hope this work will provide the NLP community with a causal disentanglement perspective for achieving fairness in NLP systems.
%R 10.18653/v1/2025.gebnlp-1.33
%U https://aclanthology.org/2025.gebnlp-1.33/
%U https://doi.org/10.18653/v1/2025.gebnlp-1.33
%P 393-402

Download as File

Markdown (Informal)

[Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models](https://aclanthology.org/2025.gebnlp-1.33/) (Qian et al., GeBNLP 2025)

Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models (Qian et al., GeBNLP 2025)

ACL

Yangge Qian, Yilong Hu, Siqi Zhang, Xu Gu, and Xiaolin Qin. 2025. Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 393–402, Vienna, Austria. Association for Computational Linguistics.