SetConv: A New Approach for Learning from Imbalanced Data

Yang Gao (扬 高); Yi-Fan Li; Yu Lin; Charu Aggarwal; Latifur Khan

doi:10.18653/v1/2020.emnlp-main.98

SetConv: A New Approach for Learning from Imbalanced Data

Yang Gao, Yi-Fan Li, Yu Lin, Charu Aggarwal, Latifur Khan

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

For many real-world classification problems, e.g., sentiment classification, most existing machine learning methods are biased towards the majority class when the Imbalance Ratio (IR) is high. To address this problem, we propose a set convolution (SetConv) operation and an episodic training strategy to extract a single representative for each class, so that classifiers can later be trained on a balanced class distribution. We prove that our proposed algorithm is permutation-invariant despite the order of inputs, and experiments on multiple large-scale benchmark text datasets show the superiority of our proposed framework when compared to other SOTA methods.

Anthology ID:: 2020.emnlp-main.98
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1284–1294
Language:
URL:: https://aclanthology.org/2020.emnlp-main.98/
DOI:: 10.18653/v1/2020.emnlp-main.98
Bibkey:
Cite (ACL):: Yang Gao, Yi-Fan Li, Yu Lin, Charu Aggarwal, and Latifur Khan. 2020. SetConv: A New Approach for Learning from Imbalanced Data. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1284–1294, Online. Association for Computational Linguistics.
Cite (Informal):: SetConv: A New Approach for Learning from Imbalanced Data (Gao et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.98.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{gao-etal-2020-setconv,
    title = "{S}et{C}onv: {A} {N}ew {A}pproach for {L}earning from {I}mbalanced {D}ata",
    author = "Gao, Yang  and
      Li, Yi-Fan  and
      Lin, Yu  and
      Aggarwal, Charu  and
      Khan, Latifur",
    editor = "Webber, Bonnie  and
      Cohn, Trevor  and
      He, Yulan  and
      Liu, Yang",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.98/",
    doi = "10.18653/v1/2020.emnlp-main.98",
    pages = "1284--1294",
    abstract = "For many real-world classification problems, e.g., sentiment classification, most existing machine learning methods are biased towards the majority class when the Imbalance Ratio (IR) is high. To address this problem, we propose a set convolution (SetConv) operation and an episodic training strategy to extract a single representative for each class, so that classifiers can later be trained on a balanced class distribution. We prove that our proposed algorithm is permutation-invariant despite the order of inputs, and experiments on multiple large-scale benchmark text datasets show the superiority of our proposed framework when compared to other SOTA methods."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="gao-etal-2020-setconv">
    <titleInfo>
        <title>SetConv: A New Approach for Learning from Imbalanced Data</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Yang</namePart>
        <namePart type="family">Gao</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yi-Fan</namePart>
        <namePart type="family">Li</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yu</namePart>
        <namePart type="family">Lin</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Charu</namePart>
        <namePart type="family">Aggarwal</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Latifur</namePart>
        <namePart type="family">Khan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Bonnie</namePart>
            <namePart type="family">Webber</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Trevor</namePart>
            <namePart type="family">Cohn</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yulan</namePart>
            <namePart type="family">He</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yang</namePart>
            <namePart type="family">Liu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Online</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>For many real-world classification problems, e.g., sentiment classification, most existing machine learning methods are biased towards the majority class when the Imbalance Ratio (IR) is high. To address this problem, we propose a set convolution (SetConv) operation and an episodic training strategy to extract a single representative for each class, so that classifiers can later be trained on a balanced class distribution. We prove that our proposed algorithm is permutation-invariant despite the order of inputs, and experiments on multiple large-scale benchmark text datasets show the superiority of our proposed framework when compared to other SOTA methods.</abstract>
    <identifier type="citekey">gao-etal-2020-setconv</identifier>
    <identifier type="doi">10.18653/v1/2020.emnlp-main.98</identifier>
    <location>
        <url>https://aclanthology.org/2020.emnlp-main.98/</url>
    </location>
    <part>
        <date>2020-11</date>
        <extent unit="page">
            <start>1284</start>
            <end>1294</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T SetConv: A New Approach for Learning from Imbalanced Data
%A Gao, Yang
%A Li, Yi-Fan
%A Lin, Yu
%A Aggarwal, Charu
%A Khan, Latifur
%Y Webber, Bonnie
%Y Cohn, Trevor
%Y He, Yulan
%Y Liu, Yang
%S Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
%D 2020
%8 November
%I Association for Computational Linguistics
%C Online
%F gao-etal-2020-setconv
%X For many real-world classification problems, e.g., sentiment classification, most existing machine learning methods are biased towards the majority class when the Imbalance Ratio (IR) is high. To address this problem, we propose a set convolution (SetConv) operation and an episodic training strategy to extract a single representative for each class, so that classifiers can later be trained on a balanced class distribution. We prove that our proposed algorithm is permutation-invariant despite the order of inputs, and experiments on multiple large-scale benchmark text datasets show the superiority of our proposed framework when compared to other SOTA methods.
%R 10.18653/v1/2020.emnlp-main.98
%U https://aclanthology.org/2020.emnlp-main.98/
%U https://doi.org/10.18653/v1/2020.emnlp-main.98
%P 1284-1294

Download as File

Markdown (Informal)

[SetConv: A New Approach for Learning from Imbalanced Data](https://aclanthology.org/2020.emnlp-main.98/) (Gao et al., EMNLP 2020)

SetConv: A New Approach for Learning from Imbalanced Data (Gao et al., EMNLP 2020)

ACL

Yang Gao, Yi-Fan Li, Yu Lin, Charu Aggarwal, and Latifur Khan. 2020. SetConv: A New Approach for Learning from Imbalanced Data. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1284–1294, Online. Association for Computational Linguistics.