Topic Stability over Noisy Sources

Jing Su; Derek Greene; Oisín Boydell

Topic Stability over Noisy Sources

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of textual noise can have diverse effects on the stability of topic models. On the other hand, topic model stability is not consistent with the same type but different levels of noise. We introduce a dictionary filtering approach to address this challenge, with the result that a topic model with the correct number of topics is always identified across different levels of noise.

Anthology ID:: W16-3913
Volume:: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Month:: December
Year:: 2016
Address:: Osaka, Japan
Editors:: Bo Han, Alan Ritter, Leon Derczynski, Wei Xu, Tim Baldwin
Venue:: WNUT
SIG:
Publisher:: The COLING 2016 Organizing Committee
Note:
Pages:: 85–93
Language:
URL:: https://aclanthology.org/W16-3913/
DOI:
Bibkey:
Cite (ACL):: Jing Su, Derek Greene, and Oisín Boydell. 2016. Topic Stability over Noisy Sources. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pages 85–93, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):: Topic Stability over Noisy Sources (Su et al., WNUT 2016)
Copy Citation:
PDF:: https://aclanthology.org/W16-3913.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{su-etal-2016-topic,
    title = "Topic Stability over Noisy Sources",
    author = "Su, Jing  and
      Greene, Derek  and
      Boydell, Ois{\'i}n",
    editor = "Han, Bo  and
      Ritter, Alan  and
      Derczynski, Leon  and
      Xu, Wei  and
      Baldwin, Tim",
    booktitle = "Proceedings of the 2nd Workshop on Noisy User-generated Text ({WNUT})",
    month = dec,
    year = "2016",
    address = "Osaka, Japan",
    publisher = "The COLING 2016 Organizing Committee",
    url = "https://aclanthology.org/W16-3913/",
    pages = "85--93",
    abstract = "Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of textual noise can have diverse effects on the stability of topic models. On the other hand, topic model stability is not consistent with the same type but different levels of noise. We introduce a dictionary filtering approach to address this challenge, with the result that a topic model with the correct number of topics is always identified across different levels of noise."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="su-etal-2016-topic">
    <titleInfo>
        <title>Topic Stability over Noisy Sources</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Jing</namePart>
        <namePart type="family">Su</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Derek</namePart>
        <namePart type="family">Greene</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Oisín</namePart>
        <namePart type="family">Boydell</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2016-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Bo</namePart>
            <namePart type="family">Han</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alan</namePart>
            <namePart type="family">Ritter</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Leon</namePart>
            <namePart type="family">Derczynski</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Wei</namePart>
            <namePart type="family">Xu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Tim</namePart>
            <namePart type="family">Baldwin</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>The COLING 2016 Organizing Committee</publisher>
            <place>
                <placeTerm type="text">Osaka, Japan</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of textual noise can have diverse effects on the stability of topic models. On the other hand, topic model stability is not consistent with the same type but different levels of noise. We introduce a dictionary filtering approach to address this challenge, with the result that a topic model with the correct number of topics is always identified across different levels of noise.</abstract>
    <identifier type="citekey">su-etal-2016-topic</identifier>
    <location>
        <url>https://aclanthology.org/W16-3913/</url>
    </location>
    <part>
        <date>2016-12</date>
        <extent unit="page">
            <start>85</start>
            <end>93</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Topic Stability over Noisy Sources
%A Su, Jing
%A Greene, Derek
%A Boydell, Oisín
%Y Han, Bo
%Y Ritter, Alan
%Y Derczynski, Leon
%Y Xu, Wei
%Y Baldwin, Tim
%S Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
%D 2016
%8 December
%I The COLING 2016 Organizing Committee
%C Osaka, Japan
%F su-etal-2016-topic
%X Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of textual noise can have diverse effects on the stability of topic models. On the other hand, topic model stability is not consistent with the same type but different levels of noise. We introduce a dictionary filtering approach to address this challenge, with the result that a topic model with the correct number of topics is always identified across different levels of noise.
%U https://aclanthology.org/W16-3913/
%P 85-93

Download as File

Markdown (Informal)

[Topic Stability over Noisy Sources](https://aclanthology.org/W16-3913/) (Su et al., WNUT 2016)

Topic Stability over Noisy Sources (Su et al., WNUT 2016)

ACL

Jing Su, Derek Greene, and Oisín Boydell. 2016. Topic Stability over Noisy Sources. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pages 85–93, Osaka, Japan. The COLING 2016 Organizing Committee.