UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing

Yijun Yang, Jie He, Pinzhen Chen, Victor Gutierrez Basulto, Jeff Pan


Abstract
Several recent papers have investigated the potential of language models as knowledge bases as well as the existence of severe biases when extracting factual knowledge. In this work, we focus on the factual probing performance over unseen prompts from tuning, and using a probabilistic view we show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge. We hypothesize that simultaneously debiasing these objectives can be the key to generalisation over unseen prompts. We propose an adapter-based framework, **UniArk**, for generalised and consistent factual knowledge extraction through simple methods without introducing extra parameters. Extensive experiments show that UniArk can significantly improve the model’s out-of-domain generalisation as well as consistency under various prompts. Additionally, we construct **ParaTrex**, a large-scale and diverse dataset for measuring the inconsistency and out-of-domain generation of models. Further, ParaTrex offers a reference method for constructing paraphrased datasets using large language models.
Anthology ID:
2024.naacl-long.388
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7011–7028
Language:
URL:
https://aclanthology.org/2024.naacl-long.388
DOI:
Bibkey:
Cite (ACL):
Yijun Yang, Jie He, Pinzhen Chen, Victor Gutierrez Basulto, and Jeff Pan. 2024. UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7011–7028, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing (Yang et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.388.pdf
Copyright:
 2024.naacl-long.388.copyright.pdf