@inproceedings{xiong-etal-2019-tweetqa,
    title = "{TWEETQA}: A Social Media Focused Question Answering Dataset",
    author = "Xiong, Wenhan  and
      Wu, Jiawei  and
      Wang, Hong  and
      Kulkarni, Vivek  and
      Yu, Mo  and
      Chang, Shiyu  and
      Guo, Xiaoxiao  and
      Wang, William Yang",
    editor = "Korhonen, Anna  and
      Traum, David  and
      M{\`a}rquez, Llu{\'i}s",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1496/",
    doi = "10.18653/v1/P19-1496",
    pages = "5020--5031",
    abstract = "With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effective-ness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike otherQA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="xiong-etal-2019-tweetqa">
    <titleInfo>
        <title>TWEETQA: A Social Media Focused Question Answering Dataset</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Wenhan</namePart>
        <namePart type="family">Xiong</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jiawei</namePart>
        <namePart type="family">Wu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Hong</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Vivek</namePart>
        <namePart type="family">Kulkarni</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mo</namePart>
        <namePart type="family">Yu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Shiyu</namePart>
        <namePart type="family">Chang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xiaoxiao</namePart>
        <namePart type="family">Guo</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">William</namePart>
        <namePart type="given">Yang</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Anna</namePart>
            <namePart type="family">Korhonen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Traum</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lluís</namePart>
            <namePart type="family">Màrquez</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Florence, Italy</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effective-ness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike otherQA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text.</abstract>
    <identifier type="citekey">xiong-etal-2019-tweetqa</identifier>
    <identifier type="doi">10.18653/v1/P19-1496</identifier>
    <location>
        <url>https://aclanthology.org/P19-1496/</url>
    </location>
    <part>
        <date>2019-07</date>
        <extent unit="page">
            <start>5020</start>
            <end>5031</end>
        </extent>
    </part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T TWEETQA: A Social Media Focused Question Answering Dataset
%A Xiong, Wenhan
%A Wu, Jiawei
%A Wang, Hong
%A Kulkarni, Vivek
%A Yu, Mo
%A Chang, Shiyu
%A Guo, Xiaoxiao
%A Wang, William Yang
%Y Korhonen, Anna
%Y Traum, David
%Y Màrquez, Lluís
%S Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
%D 2019
%8 July
%I Association for Computational Linguistics
%C Florence, Italy
%F xiong-etal-2019-tweetqa
%X With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effective-ness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike otherQA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text.
%R 10.18653/v1/P19-1496
%U https://aclanthology.org/P19-1496/
%U https://doi.org/10.18653/v1/P19-1496
%P 5020-5031
Markdown (Informal)
[TWEETQA: A Social Media Focused Question Answering Dataset](https://aclanthology.org/P19-1496/) (Xiong et al., ACL 2019)
ACL
- Wenhan Xiong, Jiawei Wu, Hong Wang, Vivek Kulkarni, Mo Yu, Shiyu Chang, Xiaoxiao Guo, and William Yang Wang. 2019. TWEETQA: A Social Media Focused Question Answering Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5020–5031, Florence, Italy. Association for Computational Linguistics.