ParaNames: A Massively Multilingual Entity Name Corpus

Jonne Sälevä, Constantine Lignos


Abstract
We present ParaNames, a Wikidata-derived multilingual parallel name resource consisting of names for approximately 14 million entities spanning over 400 languages. ParaNames is useful for multilingual language processing, both in defining tasks for name translation tasks and as supplementary data for other tasks. We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English.
Anthology ID:
2022.sigtyp-1.15
Volume:
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
July
Year:
2022
Address:
Seattle, Washington
Editors:
Ekaterina Vylomova, Edoardo Ponti, Ryan Cotterell
Venue:
SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–105
Language:
URL:
https://aclanthology.org/2022.sigtyp-1.15
DOI:
10.18653/v1/2022.sigtyp-1.15
Bibkey:
Cite (ACL):
Jonne Sälevä and Constantine Lignos. 2022. ParaNames: A Massively Multilingual Entity Name Corpus. In Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 103–105, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
ParaNames: A Massively Multilingual Entity Name Corpus (Sälevä & Lignos, SIGTYP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sigtyp-1.15.pdf
Code
 bltlab/paranames