Learning Representations for Detecting Abusive Language

Magnus Sahlgren; Tim Isbister; Fredrik Olsson

doi:10.18653/v1/W18-5115

Learning Representations for Detecting Abusive Language

Magnus Sahlgren, Tim Isbister, Fredrik Olsson

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language. The approach is inspired by recent advances in transfer learning and word embeddings, and we learn representations from two different datasets containing various degrees of abusive language. We compare the learned representation with two standard approaches; one based on lexica, and one based on data-specific n-grams. Our experiments show that learned representations do contain useful information that can be used to improve detection performance when training data is limited.

Anthology ID:: W18-5115
Volume:: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
Month:: October
Year:: 2018
Address:: Brussels, Belgium
Editors:: Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, Jacqueline Wernimont
Venue:: ALW
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 115–123
Language:
URL:: https://aclanthology.org/W18-5115/
DOI:: 10.18653/v1/W18-5115
Bibkey:
Cite (ACL):: Magnus Sahlgren, Tim Isbister, and Fredrik Olsson. 2018. Learning Representations for Detecting Abusive Language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 115–123, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Learning Representations for Detecting Abusive Language (Sahlgren et al., ALW 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-5115.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{sahlgren-etal-2018-learning,
    title = "Learning Representations for Detecting Abusive Language",
    author = "Sahlgren, Magnus  and
      Isbister, Tim  and
      Olsson, Fredrik",
    editor = "Fi{\v{s}}er, Darja  and
      Huang, Ruihong  and
      Prabhakaran, Vinodkumar  and
      Voigt, Rob  and
      Waseem, Zeerak  and
      Wernimont, Jacqueline",
    booktitle = "Proceedings of the 2nd Workshop on Abusive Language Online ({ALW}2)",
    month = oct,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W18-5115/",
    doi = "10.18653/v1/W18-5115",
    pages = "115--123",
    abstract = "This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language. The approach is inspired by recent advances in transfer learning and word embeddings, and we learn representations from two different datasets containing various degrees of abusive language. We compare the learned representation with two standard approaches; one based on lexica, and one based on data-specific $n$-grams. Our experiments show that learned representations \textit{do} contain useful information that can be used to improve detection performance when training data is limited."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="sahlgren-etal-2018-learning">
    <titleInfo>
        <title>Learning Representations for Detecting Abusive Language</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Magnus</namePart>
        <namePart type="family">Sahlgren</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tim</namePart>
        <namePart type="family">Isbister</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Fredrik</namePart>
        <namePart type="family">Olsson</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-10</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Darja</namePart>
            <namePart type="family">Fišer</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ruihong</namePart>
            <namePart type="family">Huang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vinodkumar</namePart>
            <namePart type="family">Prabhakaran</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rob</namePart>
            <namePart type="family">Voigt</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Zeerak</namePart>
            <namePart type="family">Waseem</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jacqueline</namePart>
            <namePart type="family">Wernimont</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Brussels, Belgium</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language. The approach is inspired by recent advances in transfer learning and word embeddings, and we learn representations from two different datasets containing various degrees of abusive language. We compare the learned representation with two standard approaches; one based on lexica, and one based on data-specific n-grams. Our experiments show that learned representations do contain useful information that can be used to improve detection performance when training data is limited.</abstract>
    <identifier type="citekey">sahlgren-etal-2018-learning</identifier>
    <identifier type="doi">10.18653/v1/W18-5115</identifier>
    <location>
        <url>https://aclanthology.org/W18-5115/</url>
    </location>
    <part>
        <date>2018-10</date>
        <extent unit="page">
            <start>115</start>
            <end>123</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Learning Representations for Detecting Abusive Language
%A Sahlgren, Magnus
%A Isbister, Tim
%A Olsson, Fredrik
%Y Fišer, Darja
%Y Huang, Ruihong
%Y Prabhakaran, Vinodkumar
%Y Voigt, Rob
%Y Waseem, Zeerak
%Y Wernimont, Jacqueline
%S Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
%D 2018
%8 October
%I Association for Computational Linguistics
%C Brussels, Belgium
%F sahlgren-etal-2018-learning
%X This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language. The approach is inspired by recent advances in transfer learning and word embeddings, and we learn representations from two different datasets containing various degrees of abusive language. We compare the learned representation with two standard approaches; one based on lexica, and one based on data-specific n-grams. Our experiments show that learned representations do contain useful information that can be used to improve detection performance when training data is limited.
%R 10.18653/v1/W18-5115
%U https://aclanthology.org/W18-5115/
%U https://doi.org/10.18653/v1/W18-5115
%P 115-123

Download as File

Markdown (Informal)

[Learning Representations for Detecting Abusive Language](https://aclanthology.org/W18-5115/) (Sahlgren et al., ALW 2018)

Learning Representations for Detecting Abusive Language (Sahlgren et al., ALW 2018)

ACL

Magnus Sahlgren, Tim Isbister, and Fredrik Olsson. 2018. Learning Representations for Detecting Abusive Language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 115–123, Brussels, Belgium. Association for Computational Linguistics.