Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual Samples

Chengyuan Liu; Leilei Gan; Kun Kuang; Fei Wu

doi:10.18653/v1/2022.emnlp-main.370

Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual Samples

Chengyuan Liu, Leilei Gan, Kun Kuang, Fei Wu

Abstract

The aim of Logic2Text is to generate controllable and faithful texts conditioned on tables and logical forms, which not only requires a deep understanding of the tables and logical forms, but also warrants symbolic reasoning over the tables according to the logical forms. State-of-the-art methods based on pre-trained models have achieved remarkable performance on the standard test dataset. However, we question whether these methods really learn how to perform logical reasoning, rather than just relying on the spurious correlations between the headers of the tables and operators of the logical form. To verify this hypothesis, we manually construct a set of counterfactual samples, which modify the original logical forms to generate counterfactual logical forms with rare co-occurred headers and operators and corresponding counterfactual references. SOTA methods give much worse results on these counterfactual samples compared with the results on the original test dataset, which verifies our hypothesis. To deal with this problem, we firstly analyze this bias from a causal perspective, based on which we propose two approaches to reduce the model’s reliance on the shortcut. The first one incorporates the hierarchical structure of the logical forms into the model. The second one exploits automatically generated counterfactual data for training. Automatic and manual experimental results on the original test dataset and counterfactual dataset show that our method is effective to alleviate the spurious correlation. Our work points out the weakness of current methods and takes a further step toward developing Logic2Text models with real logical reasoning ability.

Anthology ID:: 2022.emnlp-main.370
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5499–5512
Language:
URL:: https://aclanthology.org/2022.emnlp-main.370/
DOI:: 10.18653/v1/2022.emnlp-main.370
Bibkey:
Cite (ACL):: Chengyuan Liu, Leilei Gan, Kun Kuang, and Fei Wu. 2022. Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual Samples. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5499–5512, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual Samples (Liu et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.370.pdf

PDF Cite Search Fix data