NorQuAD: Norwegian Question Answering Dataset

Sardana Ivanova; Fredrik Andreassen; Matias Jentoft; Sondre Wold; Lilja Øvrelid

NorQuAD: Norwegian Question Answering Dataset

Sardana Ivanova, Fredrik Andreassen, Matias Jentoft, Sondre Wold, Lilja Øvrelid

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.

Anthology ID:: 2023.nodalida-1.17
Volume:: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:: May
Year:: 2023
Address:: Tórshavn, Faroe Islands
Editors:: Tanel Alumäe, Mark Fishel
Venue:: NoDaLiDa
SIG:
Publisher:: University of Tartu Library
Note:
Pages:: 159–168
Language:
URL:: https://aclanthology.org/2023.nodalida-1.17/
DOI:
Bibkey:
Cite (ACL):: Sardana Ivanova, Fredrik Andreassen, Matias Jentoft, Sondre Wold, and Lilja Øvrelid. 2023. NorQuAD: Norwegian Question Answering Dataset. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 159–168, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):: NorQuAD: Norwegian Question Answering Dataset (Ivanova et al., NoDaLiDa 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.nodalida-1.17.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{ivanova-etal-2023-norquad,
    title = "{N}or{Q}u{AD}: {N}orwegian Question Answering Dataset",
    author = "Ivanova, Sardana  and
      Andreassen, Fredrik  and
      Jentoft, Matias  and
      Wold, Sondre  and
      {\O}vrelid, Lilja",
    editor = {Alum{\"a}e, Tanel  and
      Fishel, Mark},
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = may,
    year = "2023",
    address = "T{\'o}rshavn, Faroe Islands",
    publisher = "University of Tartu Library",
    url = "https://aclanthology.org/2023.nodalida-1.17/",
    pages = "159--168",
    abstract = "In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ivanova-etal-2023-norquad">
    <titleInfo>
        <title>NorQuAD: Norwegian Question Answering Dataset</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Sardana</namePart>
        <namePart type="family">Ivanova</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Fredrik</namePart>
        <namePart type="family">Andreassen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Matias</namePart>
        <namePart type="family">Jentoft</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sondre</namePart>
        <namePart type="family">Wold</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Lilja</namePart>
        <namePart type="family">Øvrelid</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2023-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Tanel</namePart>
            <namePart type="family">Alumäe</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mark</namePart>
            <namePart type="family">Fishel</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>University of Tartu Library</publisher>
            <place>
                <placeTerm type="text">Tórshavn, Faroe Islands</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.</abstract>
    <identifier type="citekey">ivanova-etal-2023-norquad</identifier>
    <location>
        <url>https://aclanthology.org/2023.nodalida-1.17/</url>
    </location>
    <part>
        <date>2023-05</date>
        <extent unit="page">
            <start>159</start>
            <end>168</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T NorQuAD: Norwegian Question Answering Dataset
%A Ivanova, Sardana
%A Andreassen, Fredrik
%A Jentoft, Matias
%A Wold, Sondre
%A Øvrelid, Lilja
%Y Alumäe, Tanel
%Y Fishel, Mark
%S Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
%D 2023
%8 May
%I University of Tartu Library
%C Tórshavn, Faroe Islands
%F ivanova-etal-2023-norquad
%X In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.
%U https://aclanthology.org/2023.nodalida-1.17/
%P 159-168

Download as File

Markdown (Informal)

[NorQuAD: Norwegian Question Answering Dataset](https://aclanthology.org/2023.nodalida-1.17/) (Ivanova et al., NoDaLiDa 2023)

NorQuAD: Norwegian Question Answering Dataset (Ivanova et al., NoDaLiDa 2023)

ACL

Sardana Ivanova, Fredrik Andreassen, Matias Jentoft, Sondre Wold, and Lilja Øvrelid. 2023. NorQuAD: Norwegian Question Answering Dataset. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 159–168, Tórshavn, Faroe Islands. University of Tartu Library.