Global Entity Disambiguation with BERT

Ikuya Yamada, Koki Washio, Hiroyuki Shindo, Yuji Matsumoto


Abstract
We propose a global entity disambiguation (ED) model based on BERT. To capture global contextual information for ED, our model treats not only words but also entities as input tokens, and solves the task by sequentially resolving mentions to their referent entities and using resolved entities as inputs at each step. We train the model using a large entity-annotated corpus obtained from Wikipedia. We achieve new state-of-the-art results on five standard ED datasets: AIDA-CoNLL, MSNBC, AQUAINT, ACE2004, and WNED-WIKI. The source code and model checkpoint are available at https://github.com/studio-ousia/luke.
Anthology ID:
2022.naacl-main.238
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3264–3271
Language:
URL:
https://aclanthology.org/2022.naacl-main.238
DOI:
10.18653/v1/2022.naacl-main.238
Bibkey:
Cite (ACL):
Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2022. Global Entity Disambiguation with BERT. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3264–3271, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Global Entity Disambiguation with BERT (Yamada et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.238.pdf
Video:
 https://aclanthology.org/2022.naacl-main.238.mp4
Code
 studio-ousia/luke
Data
ACE 2004AIDA CoNLL-YAGOAQUAINTCoNLL