GBT: Generative Boosting Training Approach for Paraphrase Identification

Rui Peng, Zhiling Jin, Yu Hong


Abstract
Paraphrase Identification (PI), a task of determining whether a pair of sentences express the same meaning, is widely applied in Information Retrieval and Question Answering. Data Augmentation (DA) is proven effective in tackling the PI task. However, the majority of DA methods still suffer from two limitations: inefficiency and poor quality. In this study, we propose the Generative Boosting Training (GBT) approach for PI. GBT designs a boosting learning method for a single model based on the human learning process, utilizing seq2seq model to perform DA on misclassified instances periodically. We conduct experiments on the benchmark corpora QQP and LCQMC, towards both English and Chinese PI tasks. Experimental results show that our method yields significant improvements on a variety of Pre-trained Language Model (PLM) based baselines with good efficiency and effectiveness. It is noteworthy that a single BERT model (with a linear classifier) can outperform the state-of-the-art PI models with the boosting of GBT.
Anthology ID:
2023.findings-emnlp.405
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6094–6103
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.405
DOI:
10.18653/v1/2023.findings-emnlp.405
Bibkey:
Cite (ACL):
Rui Peng, Zhiling Jin, and Yu Hong. 2023. GBT: Generative Boosting Training Approach for Paraphrase Identification. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6094–6103, Singapore. Association for Computational Linguistics.
Cite (Informal):
GBT: Generative Boosting Training Approach for Paraphrase Identification (Peng et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.405.pdf