GraDA: Graph Generative Data Augmentation for Commonsense Reasoning

Adyasha Maharana, Mohit Bansal


Abstract
Recent advances in commonsense reasoning have been fueled by the availability of large-scale human annotated datasets. Manual annotation of such datasets, many of which are based on existing knowledge bases, is expensive and not scalable. Moreover, it is challenging to build augmentation data for commonsense reasoning because the synthetic questions need to adhere to real-world scenarios. Hence, we present GraDA, a graph-generative data augmentation framework to synthesize factual data samples from knowledge graphs for commonsense reasoning datasets. First, we train a graph-to-text model for conditional generation of questions from graph entities and relations. Then, we train a generator with GAN loss to generate distractors for synthetic questions. Our approach improves performance for SocialIQA, CODAH, HellaSwag and CommonsenseQA, and works well for generative tasks like ProtoQA. We show improvement in robustness to semantic adversaries after training with GraDA and provide human evaluation of the quality of synthetic datasets in terms of factuality and answerability. Our work provides evidence and encourages future research into graph-based generative data augmentation.
Anthology ID:
2022.dlg4nlp-1.6
Volume:
Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022)
Month:
July
Year:
2022
Address:
Seattle, Washington
Editors:
Lingfei Wu, Bang Liu, Rada Mihalcea, Jian Pei, Yue Zhang, Yunyao Li
Venue:
DLG4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–59
Language:
URL:
https://aclanthology.org/2022.dlg4nlp-1.6
DOI:
10.18653/v1/2022.dlg4nlp-1.6
Bibkey:
Cite (ACL):
Adyasha Maharana and Mohit Bansal. 2022. GraDA: Graph Generative Data Augmentation for Commonsense Reasoning. In Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022), pages 43–59, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
GraDA: Graph Generative Data Augmentation for Commonsense Reasoning (Maharana & Bansal, DLG4NLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.dlg4nlp-1.6.pdf