Semi-Supervised Disfluency Detection

Feng Wang; Wei Chen; Zhen Yang; Qianqian Dong; Shuang Xu; Bo Xu (徐波, 徐博)

Semi-Supervised Disfluency Detection

Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, Bo Xu

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines.

Anthology ID:: C18-1299
Volume:: Proceedings of the 27th International Conference on Computational Linguistics
Month:: August
Year:: 2018
Address:: Santa Fe, New Mexico, USA
Editors:: Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3529–3538
Language:
URL:: https://aclanthology.org/C18-1299/
DOI:
Bibkey:
Cite (ACL):: Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, and Bo Xu. 2018. Semi-Supervised Disfluency Detection. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3529–3538, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Semi-Supervised Disfluency Detection (Wang et al., COLING 2018)
Copy Citation:
PDF:: https://aclanthology.org/C18-1299.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{wang-etal-2018-semi,
    title = "Semi-Supervised Disfluency Detection",
    author = "Wang, Feng  and
      Chen, Wei  and
      Yang, Zhen  and
      Dong, Qianqian  and
      Xu, Shuang  and
      Xu, Bo",
    editor = "Bender, Emily M.  and
      Derczynski, Leon  and
      Isabelle, Pierre",
    booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
    month = aug,
    year = "2018",
    address = "Santa Fe, New Mexico, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/C18-1299/",
    pages = "3529--3538",
    abstract = "While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="wang-etal-2018-semi">
    <titleInfo>
        <title>Semi-Supervised Disfluency Detection</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Feng</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Wei</namePart>
        <namePart type="family">Chen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Zhen</namePart>
        <namePart type="family">Yang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Qianqian</namePart>
        <namePart type="family">Dong</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Shuang</namePart>
        <namePart type="family">Xu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Bo</namePart>
        <namePart type="family">Xu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 27th International Conference on Computational Linguistics</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Emily</namePart>
            <namePart type="given">M</namePart>
            <namePart type="family">Bender</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Leon</namePart>
            <namePart type="family">Derczynski</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Pierre</namePart>
            <namePart type="family">Isabelle</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Santa Fe, New Mexico, USA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines.</abstract>
    <identifier type="citekey">wang-etal-2018-semi</identifier>
    <location>
        <url>https://aclanthology.org/C18-1299/</url>
    </location>
    <part>
        <date>2018-08</date>
        <extent unit="page">
            <start>3529</start>
            <end>3538</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Semi-Supervised Disfluency Detection
%A Wang, Feng
%A Chen, Wei
%A Yang, Zhen
%A Dong, Qianqian
%A Xu, Shuang
%A Xu, Bo
%Y Bender, Emily M.
%Y Derczynski, Leon
%Y Isabelle, Pierre
%S Proceedings of the 27th International Conference on Computational Linguistics
%D 2018
%8 August
%I Association for Computational Linguistics
%C Santa Fe, New Mexico, USA
%F wang-etal-2018-semi
%X While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines.
%U https://aclanthology.org/C18-1299/
%P 3529-3538

Download as File

Markdown (Informal)

[Semi-Supervised Disfluency Detection](https://aclanthology.org/C18-1299/) (Wang et al., COLING 2018)

Semi-Supervised Disfluency Detection (Wang et al., COLING 2018)

ACL

Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, and Bo Xu. 2018. Semi-Supervised Disfluency Detection. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3529–3538, Santa Fe, New Mexico, USA. Association for Computational Linguistics.