KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

Jiyeon Ham; Yo Joong Choe; Kyubyong Park; Ilji Choi; Hyungjoon Soh

doi:10.18653/v1/2020.findings-emnlp.39

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

Jiyeon Ham, Yo Joong Choe, Kyubyong Park, Ilji Choi, Hyungjoon Soh

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are publicly available at https://github.com/kakaobrain/KorNLUDatasets.

Anthology ID:: 2020.findings-emnlp.39
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 422–430
Language:
URL:: https://aclanthology.org/2020.findings-emnlp.39/
DOI:: 10.18653/v1/2020.findings-emnlp.39
Bibkey:
Cite (ACL):: Jiyeon Ham, Yo Joong Choe, Kyubyong Park, Ilji Choi, and Hyungjoon Soh. 2020. KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 422–430, Online. Association for Computational Linguistics.
Cite (Informal):: KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding (Ham et al., Findings 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.findings-emnlp.39.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{ham-etal-2020-kornli,
    title = "{K}or{NLI} and {K}or{STS}: New Benchmark Datasets for {K}orean Natural Language Understanding",
    author = "Ham, Jiyeon  and
      Choe, Yo Joong  and
      Park, Kyubyong  and
      Choi, Ilji  and
      Soh, Hyungjoon",
    editor = "Cohn, Trevor  and
      He, Yulan  and
      Liu, Yang",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.39/",
    doi = "10.18653/v1/2020.findings-emnlp.39",
    pages = "422--430",
    abstract = "Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are publicly available at \url{https://github.com/kakaobrain/KorNLUDatasets}."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ham-etal-2020-kornli">
    <titleInfo>
        <title>KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Jiyeon</namePart>
        <namePart type="family">Ham</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yo</namePart>
        <namePart type="given">Joong</namePart>
        <namePart type="family">Choe</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kyubyong</namePart>
        <namePart type="family">Park</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ilji</namePart>
        <namePart type="family">Choi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Hyungjoon</namePart>
        <namePart type="family">Soh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Findings of the Association for Computational Linguistics: EMNLP 2020</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Trevor</namePart>
            <namePart type="family">Cohn</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yulan</namePart>
            <namePart type="family">He</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yang</namePart>
            <namePart type="family">Liu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Online</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are publicly available at https://github.com/kakaobrain/KorNLUDatasets.</abstract>
    <identifier type="citekey">ham-etal-2020-kornli</identifier>
    <identifier type="doi">10.18653/v1/2020.findings-emnlp.39</identifier>
    <location>
        <url>https://aclanthology.org/2020.findings-emnlp.39/</url>
    </location>
    <part>
        <date>2020-11</date>
        <extent unit="page">
            <start>422</start>
            <end>430</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
%A Ham, Jiyeon
%A Choe, Yo Joong
%A Park, Kyubyong
%A Choi, Ilji
%A Soh, Hyungjoon
%Y Cohn, Trevor
%Y He, Yulan
%Y Liu, Yang
%S Findings of the Association for Computational Linguistics: EMNLP 2020
%D 2020
%8 November
%I Association for Computational Linguistics
%C Online
%F ham-etal-2020-kornli
%X Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are publicly available at https://github.com/kakaobrain/KorNLUDatasets.
%R 10.18653/v1/2020.findings-emnlp.39
%U https://aclanthology.org/2020.findings-emnlp.39/
%U https://doi.org/10.18653/v1/2020.findings-emnlp.39
%P 422-430

Download as File

Markdown (Informal)

[KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding](https://aclanthology.org/2020.findings-emnlp.39/) (Ham et al., Findings 2020)

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding (Ham et al., Findings 2020)

ACL

Jiyeon Ham, Yo Joong Choe, Kyubyong Park, Ilji Choi, and Hyungjoon Soh. 2020. KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 422–430, Online. Association for Computational Linguistics.