The Emergence of Chunking Structures with Hierarchical RNN

Zijun Wu; Anup Anand Deshmukh; Yongkang Wu; Jimmy Lin; Lili Mou

doi:10.1162/coli_a_00545

The Emergence of Chunking Structures with Hierarchical RNN

Zijun Wu, Anup Anand Deshmukh, Yongkang Wu, Jimmy Lin, Lili Mou

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This article introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on multiple datasets reveal a notable improvement of unsupervised chunking performance in both pretraining and finetuning stages. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model’s downstream-task training. This study contributes to the advancement of unsupervised syntactic structure discovery and opens avenues for further research in linguistic theory.1

Anthology ID:: 2025.cl-3.4
Volume:: Computational Linguistics, Volume 51, Issue 3 - September 2025
Month:: September
Year:: 2025
Address:: Cambridge, MA
Venue:: CL
SIG:
Publisher:: MIT Press
Note:
Pages:: 815–841
Language:
URL:: https://aclanthology.org/2025.cl-3.4/
DOI:: 10.1162/coli_a_00545
Bibkey:
Cite (ACL):: Zijun Wu, Anup Anand Deshmukh, Yongkang Wu, Jimmy Lin, and Lili Mou. 2025. The Emergence of Chunking Structures with Hierarchical RNN. Computational Linguistics, 51(3):815–841.
Cite (Informal):: The Emergence of Chunking Structures with Hierarchical RNN (Wu et al., CL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.cl-3.4.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@article{wu-etal-2025-emergence,
    title = "The Emergence of Chunking Structures with Hierarchical {RNN}",
    author = "Wu, Zijun  and
      Deshmukh, Anup Anand  and
      Wu, Yongkang  and
      Lin, Jimmy  and
      Mou, Lili",
    journal = "Computational Linguistics",
    volume = "51",
    number = "3",
    month = sep,
    year = "2025",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/2025.cl-3.4/",
    doi = "10.1162/coli_a_00545",
    pages = "815--841",
    abstract = "In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This article introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on multiple datasets reveal a notable improvement of unsupervised chunking performance in both pretraining and finetuning stages. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model{'}s downstream-task training. This study contributes to the advancement of unsupervised syntactic structure discovery and opens avenues for further research in linguistic theory.1"
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="wu-etal-2025-emergence">
    <titleInfo>
        <title>The Emergence of Chunking Structures with Hierarchical RNN</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Zijun</namePart>
        <namePart type="family">Wu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Anup</namePart>
        <namePart type="given">Anand</namePart>
        <namePart type="family">Deshmukh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yongkang</namePart>
        <namePart type="family">Wu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jimmy</namePart>
        <namePart type="family">Lin</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lili</namePart>
        <namePart type="family">Mou</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-09</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <genre authority="bibutilsgt">journal article</genre>
    <relatedItem type="host">
        <titleInfo>
            <title>Computational Linguistics</title>
        </titleInfo>
        <originInfo>
            <issuance>continuing</issuance>
            <publisher>MIT Press</publisher>
            <place>
                <placeTerm type="text">Cambridge, MA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">periodical</genre>
        <genre authority="bibutilsgt">academic journal</genre>
    </relatedItem>
    <abstract>In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This article introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on multiple datasets reveal a notable improvement of unsupervised chunking performance in both pretraining and finetuning stages. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model’s downstream-task training. This study contributes to the advancement of unsupervised syntactic structure discovery and opens avenues for further research in linguistic theory.1</abstract>
    <identifier type="citekey">wu-etal-2025-emergence</identifier>
    <identifier type="doi">10.1162/coli_a_00545</identifier>
    <location>
        <url>https://aclanthology.org/2025.cl-3.4/</url>
    </location>
    <part>
        <date>2025-09</date>
        <detail type="volume"><number>51</number></detail>
        <detail type="issue"><number>3</number></detail>
        <extent unit="page">
            <start>815</start>
            <end>841</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Journal Article
%T The Emergence of Chunking Structures with Hierarchical RNN
%A Wu, Zijun
%A Deshmukh, Anup Anand
%A Wu, Yongkang
%A Lin, Jimmy
%A Mou, Lili
%J Computational Linguistics
%D 2025
%8 September
%V 51
%N 3
%I MIT Press
%C Cambridge, MA
%F wu-etal-2025-emergence
%X In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This article introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on multiple datasets reveal a notable improvement of unsupervised chunking performance in both pretraining and finetuning stages. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model’s downstream-task training. This study contributes to the advancement of unsupervised syntactic structure discovery and opens avenues for further research in linguistic theory.1
%R 10.1162/coli_a_00545
%U https://aclanthology.org/2025.cl-3.4/
%U https://doi.org/10.1162/coli_a_00545
%P 815-841

Download as File

Markdown (Informal)

[The Emergence of Chunking Structures with Hierarchical RNN](https://aclanthology.org/2025.cl-3.4/) (Wu et al., CL 2025)

The Emergence of Chunking Structures with Hierarchical RNN (Wu et al., CL 2025)

ACL

Zijun Wu, Anup Anand Deshmukh, Yongkang Wu, Jimmy Lin, and Lili Mou. 2025. The Emergence of Chunking Structures with Hierarchical RNN. Computational Linguistics, 51(3):815–841.