Unsupervised Fine-tuning for Text Clustering

Shaohan Huang; Furu Wei; Lei Cui; Xingxing Zhang; Ming Zhou

doi:10.18653/v1/2020.coling-main.482

Unsupervised Fine-tuning for Text Clustering

Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Fine-tuning with pre-trained language models (e.g. BERT) has achieved great success in many language understanding tasks in supervised settings (e.g. text classification). However, relatively little work has been focused on applying pre-trained models in unsupervised settings, such as text clustering. In this paper, we propose a novel method to fine-tune pre-trained models unsupervisedly for text clustering, which simultaneously learns text representations and cluster assignments using a clustering oriented loss. Experiments on three text clustering datasets (namely TREC-6, Yelp, and DBpedia) show that our model outperforms the baseline methods and achieves state-of-the-art results.

Anthology ID:: 2020.coling-main.482
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 5530–5534
Language:
URL:: https://aclanthology.org/2020.coling-main.482/
DOI:: 10.18653/v1/2020.coling-main.482
Bibkey:
Cite (ACL):: Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, and Ming Zhou. 2020. Unsupervised Fine-tuning for Text Clustering. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5530–5534, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: Unsupervised Fine-tuning for Text Clustering (Huang et al., COLING 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.coling-main.482.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{huang-etal-2020-unsupervised,
    title = "Unsupervised Fine-tuning for Text Clustering",
    author = "Huang, Shaohan  and
      Wei, Furu  and
      Cui, Lei  and
      Zhang, Xingxing  and
      Zhou, Ming",
    editor = "Scott, Donia  and
      Bel, Nuria  and
      Zong, Chengqing",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2020.coling-main.482/",
    doi = "10.18653/v1/2020.coling-main.482",
    pages = "5530--5534",
    abstract = "Fine-tuning with pre-trained language models (e.g. BERT) has achieved great success in many language understanding tasks in supervised settings (e.g. text classification). However, relatively little work has been focused on applying pre-trained models in unsupervised settings, such as text clustering. In this paper, we propose a novel method to fine-tune pre-trained models unsupervisedly for text clustering, which simultaneously learns text representations and cluster assignments using a clustering oriented loss. Experiments on three text clustering datasets (namely TREC-6, Yelp, and DBpedia) show that our model outperforms the baseline methods and achieves state-of-the-art results."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="huang-etal-2020-unsupervised">
    <titleInfo>
        <title>Unsupervised Fine-tuning for Text Clustering</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Shaohan</namePart>
        <namePart type="family">Huang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Furu</namePart>
        <namePart type="family">Wei</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lei</namePart>
        <namePart type="family">Cui</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xingxing</namePart>
        <namePart type="family">Zhang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ming</namePart>
        <namePart type="family">Zhou</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 28th International Conference on Computational Linguistics</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Donia</namePart>
            <namePart type="family">Scott</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nuria</namePart>
            <namePart type="family">Bel</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Chengqing</namePart>
            <namePart type="family">Zong</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>International Committee on Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Barcelona, Spain (Online)</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Fine-tuning with pre-trained language models (e.g. BERT) has achieved great success in many language understanding tasks in supervised settings (e.g. text classification). However, relatively little work has been focused on applying pre-trained models in unsupervised settings, such as text clustering. In this paper, we propose a novel method to fine-tune pre-trained models unsupervisedly for text clustering, which simultaneously learns text representations and cluster assignments using a clustering oriented loss. Experiments on three text clustering datasets (namely TREC-6, Yelp, and DBpedia) show that our model outperforms the baseline methods and achieves state-of-the-art results.</abstract>
    <identifier type="citekey">huang-etal-2020-unsupervised</identifier>
    <identifier type="doi">10.18653/v1/2020.coling-main.482</identifier>
    <location>
        <url>https://aclanthology.org/2020.coling-main.482/</url>
    </location>
    <part>
        <date>2020-12</date>
        <extent unit="page">
            <start>5530</start>
            <end>5534</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Unsupervised Fine-tuning for Text Clustering
%A Huang, Shaohan
%A Wei, Furu
%A Cui, Lei
%A Zhang, Xingxing
%A Zhou, Ming
%Y Scott, Donia
%Y Bel, Nuria
%Y Zong, Chengqing
%S Proceedings of the 28th International Conference on Computational Linguistics
%D 2020
%8 December
%I International Committee on Computational Linguistics
%C Barcelona, Spain (Online)
%F huang-etal-2020-unsupervised
%X Fine-tuning with pre-trained language models (e.g. BERT) has achieved great success in many language understanding tasks in supervised settings (e.g. text classification). However, relatively little work has been focused on applying pre-trained models in unsupervised settings, such as text clustering. In this paper, we propose a novel method to fine-tune pre-trained models unsupervisedly for text clustering, which simultaneously learns text representations and cluster assignments using a clustering oriented loss. Experiments on three text clustering datasets (namely TREC-6, Yelp, and DBpedia) show that our model outperforms the baseline methods and achieves state-of-the-art results.
%R 10.18653/v1/2020.coling-main.482
%U https://aclanthology.org/2020.coling-main.482/
%U https://doi.org/10.18653/v1/2020.coling-main.482
%P 5530-5534

Download as File

Markdown (Informal)

[Unsupervised Fine-tuning for Text Clustering](https://aclanthology.org/2020.coling-main.482/) (Huang et al., COLING 2020)

Unsupervised Fine-tuning for Text Clustering (Huang et al., COLING 2020)

ACL

Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, and Ming Zhou. 2020. Unsupervised Fine-tuning for Text Clustering. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5530–5534, Barcelona, Spain (Online). International Committee on Computational Linguistics.