Probing Pre-trained Language Models for Semantic Attributes and their Values

Meriem Beloucif, Chris Biemann


Abstract
Pretrained language models (PTLMs) yield state-of-the-art performance on many natural language processing tasks, including syntax, semantics and commonsense. In this paper, we focus on identifying to what extent do PTLMs capture semantic attributes and their values, e.g., the correlation between rich and high net worth. We use PTLMs to predict masked tokens using patterns and lists of items from Wikidata in order to verify how likely PTLMs encode semantic attributes along with their values. Such inferences based on semantics are intuitive for humans as part of our language understanding. Since PTLMs are trained on large amount of Wikipedia data we would assume that they can generate similar predictions, yet our findings reveal that PTLMs are still much worse than humans on this task. We show evidence and analysis explaining how to exploit our methodology to integrate better context and semantics into PTLMs using knowledge bases.
Anthology ID:
2021.findings-emnlp.218
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2554–2559
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.218
DOI:
10.18653/v1/2021.findings-emnlp.218
Bibkey:
Cite (ACL):
Meriem Beloucif and Chris Biemann. 2021. Probing Pre-trained Language Models for Semantic Attributes and their Values. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2554–2559, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Probing Pre-trained Language Models for Semantic Attributes and their Values (Beloucif & Biemann, Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.218.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.218.mp4
Code
 uhh-lt/semantic-probing
Data
FrameNet