%0 Conference Proceedings %T Open Knowledge Graphs Canonicalization using Variational Autoencoders %A Dash, Sarthak %A Rossiello, Gaetano %A Mihindukulasooriya, Nandana %A Bagchi, Sugato %A Gliozzo, Alfio %Y Moens, Marie-Francine %Y Huang, Xuanjing %Y Specia, Lucia %Y Yih, Scott Wen-tau %S Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing %D 2021 %8 November %I Association for Computational Linguistics %C Online and Punta Cana, Dominican Republic %F dash-etal-2021-open %X Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders and Side Information (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems. %R 10.18653/v1/2021.emnlp-main.811 %U https://aclanthology.org/2021.emnlp-main.811 %U https://doi.org/10.18653/v1/2021.emnlp-main.811 %P 10379-10394