@inproceedings{yang-etal-2019-exploring-deep,
    title = "Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification",
    author = "Yang, Fan  and
      Peng, Xiaochang  and
      Ghosh, Gargi  and
      Shilon, Reshef  and
      Ma, Hao  and
      Moore, Eider  and
      Predovic, Goran",
    editor = "Roberts, Sarah T.  and
      Tetreault, Joel  and
      Prabhakaran, Vinodkumar  and
      Waseem, Zeerak",
    booktitle = "Proceedings of the Third Workshop on Abusive Language Online",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-3502/",
    doi = "10.18653/v1/W19-3502",
    pages = "11--18",
    abstract = "Interactions among users on social network platforms are usually positive, constructive and insightful. However, sometimes people also get exposed to objectionable content such as hate speech, bullying, and verbal abuse etc. Most social platforms have explicit policy against hate speech because it creates an environment of intimidation and exclusion, and in some cases may promote real-world violence. As users' interactions on today{'}s social networks involve multiple modalities, such as texts, images and videos, in this paper we explore the challenge of automatically identifying hate speech with deep multimodal technologies, extending previous research which mostly focuses on the text signal alone. We present a number of fusion approaches to integrate text and photo signals. We show that augmenting text with image embedding information immediately leads to a boost in performance, while applying additional attention fusion methods brings further improvement."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="yang-etal-2019-exploring-deep">
    <titleInfo>
        <title>Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Fan</namePart>
        <namePart type="family">Yang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xiaochang</namePart>
        <namePart type="family">Peng</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Gargi</namePart>
        <namePart type="family">Ghosh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Reshef</namePart>
        <namePart type="family">Shilon</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Hao</namePart>
        <namePart type="family">Ma</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Eider</namePart>
        <namePart type="family">Moore</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Goran</namePart>
        <namePart type="family">Predovic</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Workshop on Abusive Language Online</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Sarah</namePart>
            <namePart type="given">T</namePart>
            <namePart type="family">Roberts</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Joel</namePart>
            <namePart type="family">Tetreault</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vinodkumar</namePart>
            <namePart type="family">Prabhakaran</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Zeerak</namePart>
            <namePart type="family">Waseem</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Florence, Italy</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Interactions among users on social network platforms are usually positive, constructive and insightful. However, sometimes people also get exposed to objectionable content such as hate speech, bullying, and verbal abuse etc. Most social platforms have explicit policy against hate speech because it creates an environment of intimidation and exclusion, and in some cases may promote real-world violence. As users’ interactions on today’s social networks involve multiple modalities, such as texts, images and videos, in this paper we explore the challenge of automatically identifying hate speech with deep multimodal technologies, extending previous research which mostly focuses on the text signal alone. We present a number of fusion approaches to integrate text and photo signals. We show that augmenting text with image embedding information immediately leads to a boost in performance, while applying additional attention fusion methods brings further improvement.</abstract>
    <identifier type="citekey">yang-etal-2019-exploring-deep</identifier>
    <identifier type="doi">10.18653/v1/W19-3502</identifier>
    <location>
        <url>https://aclanthology.org/W19-3502/</url>
    </location>
    <part>
        <date>2019-08</date>
        <extent unit="page">
            <start>11</start>
            <end>18</end>
        </extent>
    </part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification
%A Yang, Fan
%A Peng, Xiaochang
%A Ghosh, Gargi
%A Shilon, Reshef
%A Ma, Hao
%A Moore, Eider
%A Predovic, Goran
%Y Roberts, Sarah T.
%Y Tetreault, Joel
%Y Prabhakaran, Vinodkumar
%Y Waseem, Zeerak
%S Proceedings of the Third Workshop on Abusive Language Online
%D 2019
%8 August
%I Association for Computational Linguistics
%C Florence, Italy
%F yang-etal-2019-exploring-deep
%X Interactions among users on social network platforms are usually positive, constructive and insightful. However, sometimes people also get exposed to objectionable content such as hate speech, bullying, and verbal abuse etc. Most social platforms have explicit policy against hate speech because it creates an environment of intimidation and exclusion, and in some cases may promote real-world violence. As users’ interactions on today’s social networks involve multiple modalities, such as texts, images and videos, in this paper we explore the challenge of automatically identifying hate speech with deep multimodal technologies, extending previous research which mostly focuses on the text signal alone. We present a number of fusion approaches to integrate text and photo signals. We show that augmenting text with image embedding information immediately leads to a boost in performance, while applying additional attention fusion methods brings further improvement.
%R 10.18653/v1/W19-3502
%U https://aclanthology.org/W19-3502/
%U https://doi.org/10.18653/v1/W19-3502
%P 11-18
Markdown (Informal)
[Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification](https://aclanthology.org/W19-3502/) (Yang et al., ALW 2019)
ACL