%0 Conference Proceedings
%T Open Knowledge Graphs Canonicalization using Variational Autoencoders
%A Dash, Sarthak
%A Rossiello, Gaetano
%A Mihindukulasooriya, Nandana
%A Bagchi, Sugato
%A Gliozzo, Alfio
%Y Moens, Marie-Francine
%Y Huang, Xuanjing
%Y Specia, Lucia
%Y Yih, Scott Wen-tau
%S Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
%D 2021
%8 November
%I Association for Computational Linguistics
%C Online and Punta Cana, Dominican Republic
%F dash-etal-2021-open
%X Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders and Side Information (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.
%R 10.18653/v1/2021.emnlp-main.811
%U https://aclanthology.org/2021.emnlp-main.811
%U https://doi.org/10.18653/v1/2021.emnlp-main.811
%P 10379-10394