MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Canwen Xu, Jiaxin Pei, Hongtao Wu, Yiyu Liu, Chenliang Li


Abstract
Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.
Anthology ID:
2020.acl-main.330
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3586–3596
Language:
URL:
https://aclanthology.org/2020.acl-main.330
DOI:
10.18653/v1/2020.acl-main.330
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.330.pdf
Video:
 http://slideslive.com/38928859
Code
 WHUIR/MATINF
Data
MATINFAG NewsDuReaderLCSTSMS MARCONEWSROOM