Avoiding Overlap in Data Augmentation for AMR-to-Text Generation

Wenchao Du, Jeffrey Flanigan


Abstract
Leveraging additional unlabeled data to boost model performance is common practice in machine learning and natural language processing. For generation tasks, if there is overlap between the additional data and the target text evaluation data, then training on the additional data is training on answers of the test set. This leads to overly-inflated scores with the additional data compared to real-world testing scenarios and problems when comparing models. We study the AMR dataset and Gigaword, which is popularly used for improving AMR-to-text generators, and find significant overlap between Gigaword and a subset of the AMR dataset. We propose methods for excluding parts of Gigaword to remove this overlap, and show that our approach leads to a more realistic evaluation of the task of AMR-to-text generation. Going forward, we give simple best-practice recommendations for leveraging additional data in AMR-to-text generation.
Anthology ID:
2021.acl-short.132
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1043–1048
Language:
URL:
https://aclanthology.org/2021.acl-short.132
DOI:
10.18653/v1/2021.acl-short.132
Bibkey:
Cite (ACL):
Wenchao Du and Jeffrey Flanigan. 2021. Avoiding Overlap in Data Augmentation for AMR-to-Text Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 1043–1048, Online. Association for Computational Linguistics.
Cite (Informal):
Avoiding Overlap in Data Augmentation for AMR-to-Text Generation (Du & Flanigan, ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-short.132.pdf
Video:
 https://aclanthology.org/2021.acl-short.132.mp4