Exploring Data Augmentation in Neural DRS-to-Text Generation

Muhammad Saad Amin; Luca Anselma; Alessandro Mazzei

doi:10.18653/v1/2024.eacl-long.132

Exploring Data Augmentation in Neural DRS-to-Text Generation

Muhammad Saad Amin, Luca Anselma, Alessandro Mazzei

Abstract

Neural networks are notoriously data-hungry. This represents an issue in cases where data are scarce such as in low-resource languages. Data augmentation is a technique commonly used in computer vision to provide neural networks with more data and increase their generalization power. When dealing with data augmentation for natural language, however, simple data augmentation techniques similar to the ones used in computer vision such as rotation and cropping cannot be employed because they would generate ungrammatical texts. Thus, data augmentation needs a specific design in the case of neural logic-to-text systems, especially for a structurally rich input format such as the ones used for meaning representation. This is the case of the neural natural language generation for Discourse Representation Structures (DRS-to-Text), where the logical nature of DRS needs a specific design of data augmentation. In this paper, we adopt a novel approach in DRS-to-Text to selectively augment a training set with new data by adding and varying two specific lexical categories, i.e. proper and common nouns. In particular, we propose using WordNet supersenses to produce new training sentences using both in-and-out-of-context nouns. We present a number of experiments for evaluating the role played by augmented lexical information. The experimental results prove the effectiveness of our approach for data augmentation in DRS-to-Text generation.

Anthology ID:: 2024.eacl-long.132
Volume:: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Yvette Graham, Matthew Purver
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2164–2178
Language:
URL:: https://aclanthology.org/2024.eacl-long.132/
DOI:: 10.18653/v1/2024.eacl-long.132
Bibkey:
Cite (ACL):: Muhammad Saad Amin, Luca Anselma, and Alessandro Mazzei. 2024. Exploring Data Augmentation in Neural DRS-to-Text Generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2164–2178, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: Exploring Data Augmentation in Neural DRS-to-Text Generation (Amin et al., EACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.eacl-long.132.pdf
Video:: https://aclanthology.org/2024.eacl-long.132.mp4

PDF Cite Search Video Fix data