Cross-Domain Detection of Abusive Language Online

Vanja M. Karan; Jan Šnajder

doi:10.18653/v1/W18-5117

Cross-Domain Detection of Abusive Language Online

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types. To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least some in-domain data is important. We also show that using the frustratingly simple domain adaptation (Daume III, 2007) in most cases improves the results over in-domain training, especially when used to augment a smaller dataset with a larger one.

Anthology ID:: W18-5117
Volume:: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
Month:: October
Year:: 2018
Address:: Brussels, Belgium
Editors:: Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, Jacqueline Wernimont
Venue:: ALW
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 132–137
Language:
URL:: https://aclanthology.org/W18-5117/
DOI:: 10.18653/v1/W18-5117
Bibkey:
Cite (ACL):: Vanja Mladen Karan and Jan Šnajder. 2018. Cross-Domain Detection of Abusive Language Online. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 132–137, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Cross-Domain Detection of Abusive Language Online (Karan & Šnajder, ALW 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-5117.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{karan-snajder-2018-cross,
    title = "Cross-Domain Detection of Abusive Language Online",
    author = "Karan, Vanja Mladen  and
      {\v{S}}najder, Jan",
    editor = "Fi{\v{s}}er, Darja  and
      Huang, Ruihong  and
      Prabhakaran, Vinodkumar  and
      Voigt, Rob  and
      Waseem, Zeerak  and
      Wernimont, Jacqueline",
    booktitle = "Proceedings of the 2nd Workshop on Abusive Language Online ({ALW}2)",
    month = oct,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W18-5117/",
    doi = "10.18653/v1/W18-5117",
    pages = "132--137",
    abstract = "We investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types. To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least some in-domain data is important. We also show that using the frustratingly simple domain adaptation (Daume III, 2007) in most cases improves the results over in-domain training, especially when used to augment a smaller dataset with a larger one."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="karan-snajder-2018-cross">
    <titleInfo>
        <title>Cross-Domain Detection of Abusive Language Online</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Vanja</namePart>
        <namePart type="given">Mladen</namePart>
        <namePart type="family">Karan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jan</namePart>
        <namePart type="family">Šnajder</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-10</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Darja</namePart>
            <namePart type="family">Fišer</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ruihong</namePart>
            <namePart type="family">Huang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vinodkumar</namePart>
            <namePart type="family">Prabhakaran</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rob</namePart>
            <namePart type="family">Voigt</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Zeerak</namePart>
            <namePart type="family">Waseem</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jacqueline</namePart>
            <namePart type="family">Wernimont</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Brussels, Belgium</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types. To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least some in-domain data is important. We also show that using the frustratingly simple domain adaptation (Daume III, 2007) in most cases improves the results over in-domain training, especially when used to augment a smaller dataset with a larger one.</abstract>
    <identifier type="citekey">karan-snajder-2018-cross</identifier>
    <identifier type="doi">10.18653/v1/W18-5117</identifier>
    <location>
        <url>https://aclanthology.org/W18-5117/</url>
    </location>
    <part>
        <date>2018-10</date>
        <extent unit="page">
            <start>132</start>
            <end>137</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Cross-Domain Detection of Abusive Language Online
%A Karan, Vanja Mladen
%A Šnajder, Jan
%Y Fišer, Darja
%Y Huang, Ruihong
%Y Prabhakaran, Vinodkumar
%Y Voigt, Rob
%Y Waseem, Zeerak
%Y Wernimont, Jacqueline
%S Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
%D 2018
%8 October
%I Association for Computational Linguistics
%C Brussels, Belgium
%F karan-snajder-2018-cross
%X We investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types. To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least some in-domain data is important. We also show that using the frustratingly simple domain adaptation (Daume III, 2007) in most cases improves the results over in-domain training, especially when used to augment a smaller dataset with a larger one.
%R 10.18653/v1/W18-5117
%U https://aclanthology.org/W18-5117/
%U https://doi.org/10.18653/v1/W18-5117
%P 132-137

Download as File

Markdown (Informal)

[Cross-Domain Detection of Abusive Language Online](https://aclanthology.org/W18-5117/) (Karan & Šnajder, ALW 2018)

Cross-Domain Detection of Abusive Language Online (Karan & Šnajder, ALW 2018)

ACL

Vanja Mladen Karan and Jan Šnajder. 2018. Cross-Domain Detection of Abusive Language Online. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 132–137, Brussels, Belgium. Association for Computational Linguistics.