Neural Network Prediction of Censorable Language

Kei Yin Ng; Anna Feldman; Jing Peng; Chris Leberknight

doi:10.18653/v1/W19-2105

Neural Network Prediction of Censorable Language

Kei Yin Ng, Anna Feldman, Jing Peng, Chris Leberknight

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use ... for bold, ... for italic, ... for underline, <sc>...</sc> for small-caps, <tt>...<tt> for typewriter text, <url>...</url> for URLs, <a href=...> for hyperlinks, and <par/> for paragraph breaks.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50% accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.

Anthology ID:: W19-2105
Volume:: Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Svitlana Volkova, David Jurgens, Dirk Hovy, David Bamman, Oren Tsur
Venue:: NLP+CSS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 40–46
Language:
URL:: https://aclanthology.org/W19-2105/
DOI:: 10.18653/v1/W19-2105
Bibkey:
Cite (ACL):: Kei Yin Ng, Anna Feldman, Jing Peng, and Chris Leberknight. 2019. Neural Network Prediction of Censorable Language. In Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science, pages 40–46, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Neural Network Prediction of Censorable Language (Ng et al., NLP+CSS 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-2105.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{ng-etal-2019-neural,
    title = "Neural Network Prediction of Censorable Language",
    author = "Ng, Kei Yin  and
      Feldman, Anna  and
      Peng, Jing  and
      Leberknight, Chris",
    editor = "Volkova, Svitlana  and
      Jurgens, David  and
      Hovy, Dirk  and
      Bamman, David  and
      Tsur, Oren",
    booktitle = "Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-2105/",
    doi = "10.18653/v1/W19-2105",
    pages = "40--46",
    abstract = "Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House{'}s annual Freedom on the Net report, more than half the world{'}s Internet users now live in a place where the Internet is censored or restricted. China has built the world{'}s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention `sensitive' topics or authored by `sensitive' users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50{\%} accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ng-etal-2019-neural">
    <titleInfo>
        <title>Neural Network Prediction of Censorable Language</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Kei</namePart>
        <namePart type="given">Yin</namePart>
        <namePart type="family">Ng</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Anna</namePart>
        <namePart type="family">Feldman</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jing</namePart>
        <namePart type="family">Peng</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Chris</namePart>
        <namePart type="family">Leberknight</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-06</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Svitlana</namePart>
            <namePart type="family">Volkova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Jurgens</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Dirk</namePart>
            <namePart type="family">Hovy</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Bamman</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Oren</namePart>
            <namePart type="family">Tsur</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Minneapolis, Minnesota</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50% accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.</abstract>
    <identifier type="citekey">ng-etal-2019-neural</identifier>
    <identifier type="doi">10.18653/v1/W19-2105</identifier>
    <location>
        <url>https://aclanthology.org/W19-2105/</url>
    </location>
    <part>
        <date>2019-06</date>
        <extent unit="page">
            <start>40</start>
            <end>46</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Neural Network Prediction of Censorable Language
%A Ng, Kei Yin
%A Feldman, Anna
%A Peng, Jing
%A Leberknight, Chris
%Y Volkova, Svitlana
%Y Jurgens, David
%Y Hovy, Dirk
%Y Bamman, David
%Y Tsur, Oren
%S Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
%D 2019
%8 June
%I Association for Computational Linguistics
%C Minneapolis, Minnesota
%F ng-etal-2019-neural
%X Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50% accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.
%R 10.18653/v1/W19-2105
%U https://aclanthology.org/W19-2105/
%U https://doi.org/10.18653/v1/W19-2105
%P 40-46

Download as File

Markdown (Informal)

[Neural Network Prediction of Censorable Language](https://aclanthology.org/W19-2105/) (Ng et al., NLP+CSS 2019)

Neural Network Prediction of Censorable Language (Ng et al., NLP+CSS 2019)

ACL

Kei Yin Ng, Anna Feldman, Jing Peng, and Chris Leberknight. 2019. Neural Network Prediction of Censorable Language. In Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science, pages 40–46, Minneapolis, Minnesota. Association for Computational Linguistics.