Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation

Yuning Mao, Wenchang Ma, Deren Lei, Jiawei Han, Xiang Ren


Abstract
Prior studies on text-to-text generation typically assume that the model could figure out what to attend to in the input and what to include in the output via seq2seq learning, with only the parallel training data and no additional guidance. However, it remains unclear whether current models can preserve important concepts in the source input, as seq2seq learning does not have explicit focus on the concepts and commonly used evaluation metrics also treat them equally important as other tokens. In this paper, we present a systematic analysis that studies whether current seq2seq models, especially pre-trained language models, are good enough for preserving important input concepts and to what extent explicitly guiding generation with the concepts as lexical constraints is beneficial. We answer the above questions by conducting extensive analytical experiments on four representative text-to-text generation tasks. Based on the observations, we then propose a simple yet effective framework to automatically extract, denoise, and enforce important input concepts as lexical constraints. This new method performs comparably or better than its unconstrained counterpart on automatic metrics, demonstrates higher coverage for concept preservation, and receives better ratings in the human evaluation. Our code is available at https://github.com/morningmoni/EDE.
Anthology ID:
2021.emnlp-main.413
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5063–5074
Language:
URL:
https://aclanthology.org/2021.emnlp-main.413
DOI:
10.18653/v1/2021.emnlp-main.413
Bibkey:
Cite (ACL):
Yuning Mao, Wenchang Ma, Deren Lei, Jiawei Han, and Xiang Ren. 2021. Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5063–5074, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation (Mao et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.413.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.413.mp4
Code
 morningmoni/ede +  additional community code
Data
SQuAD