NUT-RC: Noisy User-generated Text-oriented Reading Comprehension

Rongtao Huang, Bowei Zou, Yu Hong, Wei Zhang, AiTi Aw, Guodong Zhou


Abstract
Reading comprehension (RC) on social media such as Twitter is a critical and challenging task due to its noisy, informal, but informative nature. Most existing RC models are developed on formal datasets such as news articles and Wikipedia documents, which severely limit their performances when directly applied to the noisy and informal texts in social media. Moreover, these models only focus on a certain type of RC, extractive or generative, but ignore the integration of them. To well address these challenges, we come up with a noisy user-generated text-oriented RC model. In particular, we first introduce a set of text normalizers to transform the noisy and informal texts to the formal ones. Then, we integrate the extractive and the generative RC model by a multi-task learning mechanism and an answer selection module. Experimental results on TweetQA demonstrate that our NUT-RC model significantly outperforms the state-of-the-art social media-oriented RC models.
Anthology ID:
2020.coling-main.242
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2687–2698
Language:
URL:
https://aclanthology.org/2020.coling-main.242
DOI:
10.18653/v1/2020.coling-main.242
Bibkey:
Cite (ACL):
Rongtao Huang, Bowei Zou, Yu Hong, Wei Zhang, AiTi Aw, and Guodong Zhou. 2020. NUT-RC: Noisy User-generated Text-oriented Reading Comprehension. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2687–2698, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
NUT-RC: Noisy User-generated Text-oriented Reading Comprehension (Huang et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.242.pdf
Code
 whalefallzz/nut_rc
Data
SQuADTweetQA