Knowledge-Guided Paraphrase Identification

Haoyu Wang, Fenglong Ma, Yaqing Wang, Jing Gao


Abstract
Paraphrase identification (PI), a fundamental task in natural language processing, is to identify whether two sentences express the same or similar meaning, which is a binary classification problem. Recently, BERT-like pre-trained language models have been a popular choice for the frameworks of various PI models, but almost all existing methods consider general domain text. When these approaches are applied to a specific domain, existing models cannot make accurate predictions due to the lack of professional knowledge. In light of this challenge, we propose a novel framework, namely , which can leverage the external unstructured Wikipedia knowledge to accurately identify paraphrases. We propose to mine outline knowledge of concepts related to given sentences from Wikipedia via BM25 model. After retrieving related outline knowledge, makes predictions based on both the semantic information of two sentences and the outline knowledge. Besides, we propose a gating mechanism to aggregate the semantic information-based prediction and the knowledge-based prediction. Extensive experiments are conducted on two public datasets: PARADE (a computer science domain dataset) and clinicalSTS2019 (a biomedical domain dataset). The results show that the proposed outperforms state-of-the-art methods.
Anthology ID:
2021.findings-emnlp.72
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
843–853
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.72
DOI:
10.18653/v1/2021.findings-emnlp.72
Bibkey:
Cite (ACL):
Haoyu Wang, Fenglong Ma, Yaqing Wang, and Jing Gao. 2021. Knowledge-Guided Paraphrase Identification. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 843–853, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Knowledge-Guided Paraphrase Identification (Wang et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.72.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.72.mp4
Data
BLUE