Sequence-level Large Language Model Training with Contrastive Preference Optimization

Zhili Feng; Dhananjay Ram; Cole Hawkins; Aditya Rawal; Jinman Zhao; Sheng Zha

doi:10.18653/v1/2025.findings-naacl.233

Sequence-level Large Language Model Training with Contrastive Preference Optimization

Zhili Feng, Dhananjay Ram, Cole Hawkins, Aditya Rawal, Jinman Zhao, Sheng Zha

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive preference optimization (CPO) procedure that can inject sequence-level information into the language model at any training stage without expensive human labeled data. Our experiments show that the proposed objective surpasses the next token prediction in terms of win rate in the instruction-following and text generation tasks.

Anthology ID:: 2025.findings-naacl.233
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4158–4164
Language:
URL:: https://aclanthology.org/2025.findings-naacl.233/
DOI:: 10.18653/v1/2025.findings-naacl.233
Bibkey:
Cite (ACL):: Zhili Feng, Dhananjay Ram, Cole Hawkins, Aditya Rawal, Jinman Zhao, and Sheng Zha. 2025. Sequence-level Large Language Model Training with Contrastive Preference Optimization. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 4158–4164, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Sequence-level Large Language Model Training with Contrastive Preference Optimization (Feng et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.233.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{feng-etal-2025-sequence,
    title = "Sequence-level Large Language Model Training with Contrastive Preference Optimization",
    author = "Feng, Zhili  and
      Ram, Dhananjay  and
      Hawkins, Cole  and
      Rawal, Aditya  and
      Zhao, Jinman  and
      Zha, Sheng",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.233/",
    doi = "10.18653/v1/2025.findings-naacl.233",
    pages = "4158--4164",
    ISBN = "979-8-89176-195-7",
    abstract = "The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive preference optimization (CPO) procedure that can inject sequence-level information into the language model at any training stage without expensive human labeled data. Our experiments show that the proposed objective surpasses the next token prediction in terms of win rate in the instruction-following and text generation tasks."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="feng-etal-2025-sequence">
    <titleInfo>
        <title>Sequence-level Large Language Model Training with Contrastive Preference Optimization</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Zhili</namePart>
        <namePart type="family">Feng</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dhananjay</namePart>
        <namePart type="family">Ram</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Cole</namePart>
        <namePart type="family">Hawkins</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Aditya</namePart>
        <namePart type="family">Rawal</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jinman</namePart>
        <namePart type="family">Zhao</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sheng</namePart>
        <namePart type="family">Zha</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-04</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Findings of the Association for Computational Linguistics: NAACL 2025</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Luis</namePart>
            <namePart type="family">Chiruzzo</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alan</namePart>
            <namePart type="family">Ritter</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lu</namePart>
            <namePart type="family">Wang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Albuquerque, New Mexico</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-8-89176-195-7</identifier>
    </relatedItem>
    <abstract>The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive preference optimization (CPO) procedure that can inject sequence-level information into the language model at any training stage without expensive human labeled data. Our experiments show that the proposed objective surpasses the next token prediction in terms of win rate in the instruction-following and text generation tasks.</abstract>
    <identifier type="citekey">feng-etal-2025-sequence</identifier>
    <identifier type="doi">10.18653/v1/2025.findings-naacl.233</identifier>
    <location>
        <url>https://aclanthology.org/2025.findings-naacl.233/</url>
    </location>
    <part>
        <date>2025-04</date>
        <extent unit="page">
            <start>4158</start>
            <end>4164</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Sequence-level Large Language Model Training with Contrastive Preference Optimization
%A Feng, Zhili
%A Ram, Dhananjay
%A Hawkins, Cole
%A Rawal, Aditya
%A Zhao, Jinman
%A Zha, Sheng
%Y Chiruzzo, Luis
%Y Ritter, Alan
%Y Wang, Lu
%S Findings of the Association for Computational Linguistics: NAACL 2025
%D 2025
%8 April
%I Association for Computational Linguistics
%C Albuquerque, New Mexico
%@ 979-8-89176-195-7
%F feng-etal-2025-sequence
%X The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive preference optimization (CPO) procedure that can inject sequence-level information into the language model at any training stage without expensive human labeled data. Our experiments show that the proposed objective surpasses the next token prediction in terms of win rate in the instruction-following and text generation tasks.
%R 10.18653/v1/2025.findings-naacl.233
%U https://aclanthology.org/2025.findings-naacl.233/
%U https://doi.org/10.18653/v1/2025.findings-naacl.233
%P 4158-4164

Download as File

Markdown (Informal)

[Sequence-level Large Language Model Training with Contrastive Preference Optimization](https://aclanthology.org/2025.findings-naacl.233/) (Feng et al., Findings 2025)

Sequence-level Large Language Model Training with Contrastive Preference Optimization (Feng et al., Findings 2025)

ACL

Zhili Feng, Dhananjay Ram, Cole Hawkins, Aditya Rawal, Jinman Zhao, and Sheng Zha. 2025. Sequence-level Large Language Model Training with Contrastive Preference Optimization. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 4158–4164, Albuquerque, New Mexico. Association for Computational Linguistics.