Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning

Chen Shi; Qi Chen; Lei Sha; Sujian Li (李素建); Xu Sun; Houfeng Wang; Lintao Zhang

doi:10.18653/v1/D18-1072

Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning

Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, Lintao Zhang

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1%), and provide reasonable and instructive slot labeling results.

Anthology ID:: D18-1072
Volume:: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:: October-November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 684–689
Language:
URL:: https://aclanthology.org/D18-1072/
DOI:: 10.18653/v1/D18-1072
Bibkey:
Cite (ACL):: Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, and Lintao Zhang. 2018. Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 684–689, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning (Shi et al., EMNLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/D18-1072.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{shi-etal-2018-auto,
    title = "Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning",
    author = "Shi, Chen  and
      Chen, Qi  and
      Sha, Lei  and
      Li, Sujian  and
      Sun, Xu  and
      Wang, Houfeng  and
      Zhang, Lintao",
    editor = "Riloff, Ellen  and
      Chiang, David  and
      Hockenmaier, Julia  and
      Tsujii, Jun{'}ichi",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D18-1072/",
    doi = "10.18653/v1/D18-1072",
    pages = "684--689",
    abstract = "The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1{\%}), and provide reasonable and instructive slot labeling results."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="shi-etal-2018-auto">
    <titleInfo>
        <title>Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Chen</namePart>
        <namePart type="family">Shi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Qi</namePart>
        <namePart type="family">Chen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lei</namePart>
        <namePart type="family">Sha</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sujian</namePart>
        <namePart type="family">Li</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xu</namePart>
        <namePart type="family">Sun</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Houfeng</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lintao</namePart>
        <namePart type="family">Zhang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-oct-nov</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Ellen</namePart>
            <namePart type="family">Riloff</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Chiang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Julia</namePart>
            <namePart type="family">Hockenmaier</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jun’ichi</namePart>
            <namePart type="family">Tsujii</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Brussels, Belgium</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1%), and provide reasonable and instructive slot labeling results.</abstract>
    <identifier type="citekey">shi-etal-2018-auto</identifier>
    <identifier type="doi">10.18653/v1/D18-1072</identifier>
    <location>
        <url>https://aclanthology.org/D18-1072/</url>
    </location>
    <part>
        <date>2018-oct-nov</date>
        <extent unit="page">
            <start>684</start>
            <end>689</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning
%A Shi, Chen
%A Chen, Qi
%A Sha, Lei
%A Li, Sujian
%A Sun, Xu
%A Wang, Houfeng
%A Zhang, Lintao
%Y Riloff, Ellen
%Y Chiang, David
%Y Hockenmaier, Julia
%Y Tsujii, Jun’ichi
%S Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
%D 2018
%8 oct nov
%I Association for Computational Linguistics
%C Brussels, Belgium
%F shi-etal-2018-auto
%X The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1%), and provide reasonable and instructive slot labeling results.
%R 10.18653/v1/D18-1072
%U https://aclanthology.org/D18-1072/
%U https://doi.org/10.18653/v1/D18-1072
%P 684-689

Download as File

Markdown (Informal)

[Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning](https://aclanthology.org/D18-1072/) (Shi et al., EMNLP 2018)

Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning (Shi et al., EMNLP 2018)

ACL

Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, and Lintao Zhang. 2018. Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 684–689, Brussels, Belgium. Association for Computational Linguistics.