Dual-Stage Multi-Task Syntax-Oriented Pre-Training for Syntactically Controlled Paraphrase Generation

Hongxu Liu, Xiaojie Wang, Jiashen Sun, Ke Zeng, Wan Guanglu


Abstract
Syntactically Controlled Paraphrase Generation (SCPG), which aims at generating sentences having syntactic structures resembling given exemplars, is attracting more research efforts in recent years. We took an empirical survey on previous SCPG datasets and methods and found three tacitly approved while seldom mentioned intrinsic shortcomings/trade-offs in terms of data obtaining, task formulation, and pre-training strategies. As a mitigation to these shortcomings, we proposed a novel Dual-Stage Multi-Task (DSMT) pre-training scheme, involving a series of structure-oriented and syntax-oriented tasks, which, in our opinion, gives sequential text models the ability of com-prehending intrinsically non-sequential structures like Linearized Constituency Trees (LCTs), understanding the underlying syntactics, and even generating them by parsing sentences. We performed further pre-training of the popular T5 model on these novel tasks and fine-tuned the trained model on every possible variant of SCPG task in literature, finding that our models significantly outperformed (up to 10+ BLEU-4) previous state-of-the-art methods. Finally, we carried out ablation studies which demonstrated the effectiveness of our DSMT methods and emphasized on the SCPG performance gains compared to vanilla T5 models, especially on hard samples or under few-shot settings.
Anthology ID:
2024.findings-acl.845
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14215–14231
Language:
URL:
https://aclanthology.org/2024.findings-acl.845
DOI:
Bibkey:
Cite (ACL):
Hongxu Liu, Xiaojie Wang, Jiashen Sun, Ke Zeng, and Wan Guanglu. 2024. Dual-Stage Multi-Task Syntax-Oriented Pre-Training for Syntactically Controlled Paraphrase Generation. In Findings of the Association for Computational Linguistics ACL 2024, pages 14215–14231, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Dual-Stage Multi-Task Syntax-Oriented Pre-Training for Syntactically Controlled Paraphrase Generation (Liu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.845.pdf